Title |
Procedure and datasets to compute links between genes and phenotypes defined by MeSH keywords / |
Authors |
Pranckevičienė, Erinija |
DOI |
10.12688/f1000research.6140.1 |
Full Text |
|
Is Part of |
F1000Research.. London : F1000 Research Ltd. 2015, vol. 4, art. no. 47, p. 1-14.. eISSN 2046-1402 |
Keywords [eng] |
ontology ; medical subject headings ; MySQL ; annotation, phentypes |
Abstract [eng] |
Algorithms mining relationships between genes and phenotypes can be classified into several overlapping categories based on how a phenotype is defined: by training genes known to be related to the phenotype; by keywords and algorithms designed to work with disease phenotypes. In this work an algorithm of linking phenotypes to Gene Ontology (GO) annotations is outlined, which does not require training genes and is based on algorithmic principles of Genes to Diseases (G2D) gene prioritization tool. In the outlined algorithm phenotypes are defined by terms of Medical Subject Headings (MeSH). GO annotations are linked to phenotypes through intermediate MeSH D terms of drugs and chemicals. This inference uses mathematical framework of fuzzy binary relationships based on fuzzy set theory. Strength of relationships between the terms is defined through frequency of co-occurrences of the pairs of terms in PubMed articles and a frequency of association between GO annotations and MeSH D terms in NCBI Gene gene2go and gene2pubmed datasets. Three plain tab-delimited datasets that are required by the algorithm are contributed to support computations. These datasets can be imported into a relational MySQL database. MySQL statements to create tables are provided. MySQL procedure implementing computations that are performed by outlined algorithm is listed. Plain tab-delimited format of contributed tables makes it easy to use this dataset in other applications. |
Published |
London : F1000 Research Ltd |
Type |
Journal article |
Language |
English |
Publication date |
2015 |
CC license |
|