Title Procedure and datasets to compute links between genes and phenotypes defined by MeSH keywords /
Authors Pranckevičienė, Erinija
DOI 10.12688/f1000research.6140.1
Full Text Download
Is Part of F1000Research.. London : F1000 Research Ltd. 2015, vol. 4, art. no. 47, p. 1-14.. eISSN 2046-1402
Keywords [eng] ontology ; medical subject headings ; MySQL ; annotation, phentypes
Abstract [eng] Algorithms mining relationships between genes and phenotypes can be classified into several overlapping categories based on how a phenotype is defined: by training genes known to be related to the phenotype; by keywords and algorithms designed to work with disease phenotypes. In this work an algorithm of linking phenotypes to Gene Ontology (GO) annotations is outlined, which does not require training genes and is based on algorithmic principles of Genes to Diseases (G2D) gene prioritization tool. In the outlined algorithm phenotypes are defined by terms of Medical Subject Headings (MeSH). GO annotations are linked to phenotypes through intermediate MeSH D terms of drugs and chemicals. This inference uses mathematical framework of fuzzy binary relationships based on fuzzy set theory. Strength of relationships between the terms is defined through frequency of co-occurrences of the pairs of terms in PubMed articles and a frequency of association between GO annotations and MeSH D terms in NCBI Gene gene2go and gene2pubmed datasets. Three plain tab-delimited datasets that are required by the algorithm are contributed to support computations. These datasets can be imported into a relational MySQL database. MySQL statements to create tables are provided. MySQL procedure implementing computations that are performed by outlined algorithm is listed. Plain tab-delimited format of contributed tables makes it easy to use this dataset in other applications.
Published London : F1000 Research Ltd
Type Journal article
Language English
Publication date 2015
CC license CC license description