The chemical corpus of the NLM-Chem BioCreative VII track Full-text Chemical Identification and Indexing in PubMed articles

semanticscholar(2021)

引用 6|浏览0
暂无评分
摘要
The automatic recognition of chemical names and their corresponding database identifiers in biomedical text is an important first step for many downstream text-mining applications. The NLM-Chem track at BioCreative VII aimed to foster the development of algorithms that can predict with high quality the chemical entities in biomedical literature and further identify the chemical substances that are candidates for article indexing. The NLM-Chem track corpus is a manually curated corpus comprehensively annotated with chemical entities and indexed with chemical substances. NLM-Chem BioCreative VII corpus consists of three parts: A high-quality manually annotated corpus of 200 full-text PubMed central articles, the collection of 11,500 PubMed documents previously annotated in the ChemDNER and BC5CDR challenges, which we have enriched with their corresponding chemical substance indexing, and the collection of 1,387 recently published PMC articles, equipped with chemical substance indexing by manual experts at the National Library of Medicine. This document details the characteristics of this novel resource for chemical entity recognition. Using this new resource, we have demonstrated improvements in the chemical entity recognition algorithms. Keywords—corpus annotation; inter-annotator agreement; chemical entity recognition, text mining
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要