Title Kristalografijos duomenų kokybės užtikrinimas naudojant interaktyvias ir automatines validacijos priemones /
Translation of Title Ensuring quality of crystallographic data with the help of interactive and automatic validation tools.
Authors Kaltenytė, Monika
Full Text Download
Pages 62
Abstract [eng] Crystallography is a science that explores the structure of crystalline solids and provides an understanding of the internal structure of materials. Its achievements opened doors in many areas of science, including, but not limited to drug design, physics, etc. Crystallography is progressively evolving, as of now there are more than 500 000 structures in Crystallography Open Database. However, with an increasing number of structures, the challenge of ensuring data validity also becomes a bigger issue. To prevent invalid data it is important to include the crystallography community in the crystal structures review process so that when crystallographers have doubts about a particular structure they would be able to discuss it and accepts or reject the structure based on the outcome. So the goal of this thesis is to enable crystallographers and scientists to view and efficiently validate crystal structures submitted to the database. Examine current methods and how human expert input can be combined with automated assessment tools to ensure high data quality as well as create a unified system that incorporates both automated tools that help to decrease the load of structures that need review and scientists’ input to improve the overall quality of crystal structures. This work reviews different crystallography databases and how data in them is validated and using what tools and discusses how peer-reviews and decision support systems can be used together with automated tools in order to make decisions. The main problems of crystallographic data publication were identified: the most common problems are syntax and semantic issues, which can be easily fixed by automated tools in most cases. Furthermore, there’s also a risk of “honest mistakes”, and inaccuracies that result in poor data quality. Rare, however, a more damaging problem in crystallography is fraud. They interfere with scientific inventions, are hard to detect, time-consuming and expensive. Automated tools usually cannot spot fraud and human intervention is needed. Different levels of crystal structures’ formal checks were reviewed. Automated checks include syntax validation, validation according to dictionaries, and checking according to additional subject area criteria. The first check should be syntax validation which makes sure that the file does not have basic errors such as misspelled keywords. The second step is to validate according to dictionaries to verify that data is consistent with known crystallographic data. Lastly comes additional subject area criteria validation checked against specific context. These checks should come in the right order to optimize the review process for the reviewers as the structures should reach scientists as late as possible so that scientists wouldn’t need to check the information that could be detected by automated tools and could focus their time on making crucial decisions. System that combines automated tools with human reviewers’ insights has been developed, deployed to the test server, and integrated with existing infrastructure. To ensure its accuracy and reliability, it was tested with unit tests and CIFs that currently exist in COD. While the system has not yet been field-tested, it has promising potential to help effectively review and ensure the quality of structures in the real world. Further refinement and testing of this system could help scientists to save a lot of time and financial resources that could be dedicated to other scientific tasks that lead to advances in various domains, including science, engineering, and business.
Dissertation Institution Vilniaus universitetas.
Type Master thesis
Language Lithuanian
Publication date 2023