The GNAT library for local and remote gene mention normalization.

BIOINFORMATICS(2011)

引用 89|浏览0
暂无评分
摘要
Identifying mentions of named entities, such as genes or diseases, and normalizing them to database identifiers have become an important step in many text and data mining pipelines. Despite this need, very few entity normalization systems are publicly available as source code or web services for biomedical text mining. Here we present the Gnat Java library for text retrieval, named entity recognition, and normalization of gene and protein mentions in biomedical text. The library can be used as a component to be integrated with other text-mining systems, as a framework to add user-specific extensions, and as an efficient stand-alone application for the identification of gene and protein names for data analysis. On the BioCreative III test data, the current version of Gnat achieves a Tap-20 score of 0.1987.The library and web services are implemented in Java and the sources are available from http://gnat.sourceforge.net.jorg.hakenberg@roche.com.
更多
查看译文
关键词
biomedical text,biomedical text mining,remote gene mention normalization,gnat java library,text retrieval,gnat library,biocreative iii test data,data mining pipeline,data analysis,entity recognition,web service,entity normalization system,genes,proteins,data mining,internet,publishing,gene library
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要