Occurrence of Gene Ontology , Protein Ontology , and NCBI Taxonomy Concepts in Text toward Automatic Gene Ontology Annotation of Genes and Gene Products

semanticscholar(2013)

引用 1|浏览0
暂无评分
摘要
Annotations of genes and gene products in model-organism databases with Gene Ontology (GO) terms have become an important knowledge resource in biomedical research, which has spurred many efforts at automating this labor-intensive manual curatorial activity, including many text-mining approaches. In an effort to provide some guidance on these text-mining efforts, we have used a gold-standard manually annotated corpus to conduct an evaluation of the occurrence of three types of fundamental GO-annotation concepts in 34 journal articles that were the evidential bases of approximately 220 GO annotations largely created by the Mouse Genome Informatics (MGI) group. In addition to an analysis of the occurrence of the GO concepts of the curated GO annotations associated with these articles in the corpus, we have analyzed the occurrence of NCBI Taxonomy (NCBITAXON) and Protein Ontology (PRO) concepts corresponding to the species-specific genes/gene products of these curated GO annotations. The GO, NCBITAXON, and PRO concepts corresponding to the curated GO annotations were analyzed both in the full-text versions of these articles as well as in only those sentences of the articles providing the strongest evidence for the GO annotations, as specified by an official MGI GO curator. Though this sample set may not necessarily be representative of all GO annotations, our analysis suggests that full-text articles mention substantial fractions of the GO concepts at least once; however, the mentions of these GO concepts constitute very low percentages of the mentions of all GO concepts in these articles. Nearly all PRO concepts corresponding to GO annotations are mentioned at least once in the full articles, and these PRO mentions constitute a substantial fraction of the mentions of all PRO concepts in these articles. Mus musculus is seldom mentioned, though mice (strictly corresponding to the genus Mus) are mentioned at least once in the full articles, and these Mus mentions also constitute a substantial fraction of the mentions of all NCBITAXON concepts in these articles. For all of the ontol* To whom correspondence should be addressed. ogies, counts of annotated concepts corresponding to the curated GO annotations in only the strongly evidential sentences are comparatively very low, amounting to several mentions or fewer per article. However, for most of the ontologies, concepts corresponding to the curated GO annotations appear overrepresented, though this must be viewed cautiously given that this is based on very low counts. Thus, it remains to be further examined whether this overrepresentation overrides the very low mention frequency and thus whether it would be beneficial for automatic GO-annotation systems to focus on these evidential sentences.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要