MarkerGenie: an NLP-enabled text-mining system for biomedical entity relation extraction.

Wenhao Gu,Xiao Yang, Minhao Yang,Kun Han, Wenying Pan,Zexuan Zhu

Bioinformatics advances(2022)

引用 4|浏览5
暂无评分
摘要
Motivation:Natural language processing (NLP) tasks aim to convert unstructured text data (e.g. articles or dialogues) to structured information. In recent years, we have witnessed fundamental advances of NLP technique, which has been widely used in many applications such as financial text mining, news recommendation and machine translation. However, its application in the biomedical space remains challenging due to a lack of labeled data, ambiguities and inconsistencies of biological terminology. In biomedical marker discovery studies, tools that rely on NLP models to automatically and accurately extract relations of biomedical entities are valuable as they can provide a more thorough survey of all available literature, hence providing a less biased result compared to manual curation. In addition, the fast speed of machine reader helps quickly orient research and development. Results:To address the aforementioned needs, we developed automatic training data labeling, rule-based biological terminology cleaning and a more accurate NLP model for binary associative and multi-relation prediction into the MarkerGenie program. We demonstrated the effectiveness of the proposed methods in identifying relations between biomedical entities on various benchmark datasets and case studies. Availability and implementation:MarkerGenie is available at https://www.genegeniedx.com/markergenie/. Data for model training and evaluation, term lists of biomedical entities, details of the case studies and all trained models are provided at https://drive.google.com/drive/folders/14RypiIfIr3W_K-mNIAx9BNtObHSZoAyn?usp=sharing. Supplementary information:Supplementary data are available at Bioinformatics Advances online.
更多
查看译文
关键词
extraction,relation,nlp-enabled,text-mining
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要