Entity Recognition by Distant Supervision with Soft List Constraint.

ADVANCED DATA MINING AND APPLICATIONS, ADMA 2017(2017)

引用 1|浏览49
暂无评分
摘要
Supervised named entity recognition systems often suffer from training data inadequacy when deal with domain specific corpora, e.g., documents in medical and healthcare. For these domains, obtaining some seed words or phrases is not very difficult. Then, some positive instances obtained through distant supervision based on the seeds can be used to learn recognition models. However, with the limited size of training samples and no negative ones, the classifying results may not be satisfying. In this paper, we leverage the conjunction and comma writing style as the list constraint to enlarge the set of training instances. Different from earlier studies, we formulate two kinds of constraints, namely, soft list constraint and mention constraint, as regularizers. We then incorporate the constraints to a unified discriminative learning framework and propose a joint optimization algorithm. The experimental results show that our model is superior than state-of-the-art baselines on a large collection of documents about drugs.
更多
查看译文
关键词
Distant supervision,Biomedical information extraction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要