Building a semantically annotated corpus of clinical texts.

Journal of Biomedical Informatics(2009)

引用 174|浏览0
暂无评分
摘要
In this paper, we describe the construction of a semantically annotated corpus of clinical texts for use in the development and evaluation of systems for automatically extracting clinically significant information from the textual component of patient records. The paper details the sampling of textual material from a collection of 20,000 cancer patient records, the development of a semantic annotation scheme, the annotation methodology, the distribution of annotations in the final corpus, and the use of the corpus for development of an adaptive information extraction system. The resulting corpus is the most richly semantically annotated resource for clinical text processing built to date, whose value has been demonstrated through its use in developing an effective information extraction system. The detailed presentation of our corpus construction and annotation methodology will be of value to others seeking to build high-quality semantically annotated corpora in biomedical domains.
更多
查看译文
关键词
corpora,semantic annotation scheme,high-quality semantically annotated corpus,final corpus,annotation guidelines,resulting corpus,semantically annotated corpus,clinical text,semantic annotation,temporal annotation,annotation methodology,gold standards,information extraction,corpus construction,natural language processing,adaptive information extraction system,evaluation,semantically annotated resource,text mining,effective information extraction system,gold standard
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要