Development of large-scale TCM corpus using hybrid named entity recognition methods for clinical phenotype detection: An initial study

CIBD(2014)

引用 3|浏览40
暂无评分
摘要
Clinical data is one of the core data repositories in traditional Chinese medicine (TCM) because TCM is a clinically based medicine. However, most clinical data like electronic medical record in TCM is still in free text. Due to the lack of large-scale annotation corpus in TCM field, in this paper, we aim to develop an annotation system for TCM clinical text corpus. To reduce the manual labors, we implement three named entity recognition methods like supervised machine learning method, unsupervised method and structured data comparison, to assist the batch annotations of clinical records before manual checking. We developed the system using Java and have curated more than 2,000 records of chief complaint in an effective way.
更多
查看译文
关键词
clinical data,large-scale annotation corpus,electronic medical record,traditional chinese medicine,tcm clinical text corpus,named entity recognition,structured data comparison,electronic health records,manual checking,clinical records,batch annotations,clinically based medicine,unsupervised method,clinical phenotype detection,named entity recognition methods,natural language processing,core data repositories,annotation system,text analysis,supervised machine learning method,java,unsupervised learning,hidden markov models,databases,data mining
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要