The DDI NCBI Corpus — Towards a Larger Resource for Drug-Drug Interactions in PubMed

semanticscholar(2016)

引用 0|浏览0
暂无评分
摘要
Manually annotated corpora are of great importance for the development of NLP systems, both as training and evaluation data. However, the shortage of annotated corpora frequently presents a key bottleneck in the process of developing reliable applications in the health and biomedical domain and demonstrates a need for creating larger annotated corpora. Utilizing and integrating existing corpora appears to be a vital, yet not trivial, avenue towards achieving the goal. Previous studies have revealed that drug-drug interaction (DDI) extraction methods when trained on DrugBank data do not perform well on PubMed articles. With the ultimate goal of improving the performance of our DDI extraction method on PubMed(®) articles, we construct a new gold standard corpus of drug-drug interactions in PubMed that we call the DDINCBI corpus. We combine it with the existing DDIExtraction 2013 PubMed corpus and demonstrate that by merging these two corpora higher performance is achieved compared to when either source is used separately. We release the DDINCBI corpus and make it publicly available for download in BioC format at: http://bioc.sourceforge.net/. In addition, we make the existing DDIExtraction 2013 corpus available in BioC format.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要