Unsupervised Information Extraction for Finding Gene Functions !

semanticscholar(2013)

引用 0|浏览0
暂无评分
摘要
Finding gene functions discussed in a literature is imperative to information extraction from biomedical documents. Automated, computational methodologies can reduce the need for manual curation significantly and improve quality of other related Information Extraction (IE) systems. We propose an open information extraction method for BioCreative IV GO shared task (Subtask b)—a workshop designed to find gene function terms (GO terms) for different genes in an article. The proposed open IE approach is based on distributional semantic similarity over the gene ontology terms. The method does not require the annotated data for training, which makes it highly generalizable. We achieve the f-measure of 0.26 for test-set in the official submission for BioCreative-GO shared task. ! Introduction Text mining biomedical literature aims to reduce manual labor and provide more enriched information to empower research and medical treatments. Lu et al. (1) demonstrated that there is an increasing interest to use text mining techniques for curation workflows. Currently, literature curation struggles with a lack of automated annotation techniques--particularly for gene ontology annotations (1). As medical technology advances and more curation sources become available, this need magnifies. In medical informatics alone, the number of indexed articles has increased by an average of 12% each year between 1987 and 2006 (2). With an increasing number of publications detailing even more complex information, the need to have reliable and generalizable computational techniques increases rapidly. Finding gene functions discussed in literature is crucial to genomic information extraction. Currently, tagging the gene functions in published literature is a mainly manual process. The curators find gene function evidence by reviewing each sentence in the article and mapping the results to gene ontologies. Gene Ontology (GO) (3) is a set of controlled vocabulary that defines gene product functions. BioCreative IV is a National Institutes of Health (NIH) workshop which aims to automate gene functional curation though computational methods. With a focus on gene functions, it includes two sub tasks: a) Retrieving GO evidence sentences for relevant genes, b) Predicting GO terms for relevant genes. We focus on sub task b, which finds the related gene functions (GO terms) in a set of genes discussed in an article. More details about the shared task and the corpus can be found in Auken et al. (4). This task is very similar to BioCreative I subtask
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要