A Comprehensive Comparison of Two MEDLINE Annotators for Disease and Gene Linkage: Sometimes Less is More.

Sarah ElShal,Jaak Simm,Adam Arany,Pooya Zakeri,Jesse Davis,Yves Moreau

BIOINFORMATICS AND BIOMEDICAL ENGINEERING (IWBBIO 2016)（2016）

引用 3|浏览18

暂无评分

摘要

Text mining is popular in biomedical applications because it allows retrieving highly relevant information. Particularly for us, it is quite practical in linking diseases to the genes involved in them. However text mining involves multiple challenges, such as (1) recognizing named entities (e.g., diseases and genes) inside the text, (2) constructing specific vocabularies that efficiently represent the available text, and (3) applying the correct statistical criteria to link biomedical entities with each other. We have previously developed Beegle, a tool that allows prioritizing genes for any search query of interest. The method starts with a search phase, where relevant genes are identified via the literature. Once known genes are identified, a second phase allows prioritizing novel candidate genes through a data fusion strategy. Many aspects of our method could be potentially improved. Here we evaluate two MEDLINE annotators that recognize biomedical entities inside a given abstract using different dictionaries and annotation strategies. We compare the contribution of each of the two annotators in associating genes with diseases under different vocabulary settings. Somewhat surprisingly, with fewer recognized entities and a more compact vocabulary, we obtain better associations between genes and diseases. We also propose a novel but simple association criterion to link genes with diseases, which relies on recognizing only gene entities inside the biomedical text. These refinements significantly improve the performance of our method.

查看译文

关键词

Text Mining, Latent Dirichlet Allocation, Name Entity Recognition, Latent Semantic Indexing, Disease Ontology

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要