Title Validation and extraction of molecular-geometry information from small-molecule databases /
Authors Long, Fei ; Nicholls, Robert A ; Emsley, Paul ; Gražulis, Saulius ; Merkys, Andrius ; Vaitkus, Antanas ; Murshudov, Garib N
DOI 10.1107/S2059798317000079
Full Text Download
Is Part of Acta crystallographica. Section D : structural biology.. Chester : International Union of Crystallography. 2017, Vol. 73, Part. 2, p. 103-111.. ISSN 2059-7983
Keywords [eng] validation ; high-order statistics ; Crystallography Open Database.
Abstract [eng] A freely available small-molecule structure database, the Crystallography Open Database (COD), is used for the extraction of molecular-geometry information on small-molecule compounds. The results are used for the generation of new ligand descriptions, which are subsequently used by macromolecular model-building and structure-refinement software. To increase the reliability of the derived data, and therefore the new ligand descriptions, the entries from this database were subjected to very strict validation. The selection criteria made sure that the crystal structures used to derive atom types, bond and angle classes are of sufficiently high quality. Any suspicious entries at a crystal or molecular level were removed from further consideration. The selection criteria included (i) the resolution of the data used for refinement (entries solved at 0.84 Å resolution or higher) and (ii) the structure-solution method (structures must be from a single-crystal experiment and all atoms of generated molecules must have full occupancies), as well as basic sanity checks such as (iii) consistency between the valences and the number of connections between atoms, (iv) acceptable bond-length deviations from the expected values and (v) detection of atomic collisions. The derived atom types and bond classes were then validated using high-order moment-based statistical techniques. The results of the statistical analyses were fed back to fine-tune the atom typing. The developed procedure was repeated four times, resulting in fine-grained atom typing, bond and angle classes. The procedure will be repeated in the future as and when new entries are deposited in the COD. The whole procedure can also be applied to any source of small-molecule structures, including the Cambridge Structural Database and the ZINC database.
Published Chester : International Union of Crystallography
Type Journal article
Language English
Publication date 2017