End-to-end relation extraction based on bootstrapped multi-level distant supervision

WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS(2020)

引用 8|浏览124
暂无评分
摘要
Distant supervised relation extraction has been widely used to identify new relation facts from free text, since the existence of knowledge base helps these models to build a large dataset with few human intervention and low costs of manpower and time. However, the existing Distant Supervised models are all based on the single-node classifier so that they suffer from the serious false categorization problem especially for the existence of thousands of relations. In this paper, we novelly propose an end-to-end model for relation extraction based on distant supervision. Our model divides the original categorization task into a number of sub-tasks which focus on the construction of tree-like categorization structure in multiple levels. With the tree-like structure, an unlabelled relation instance can be categorized step by step along a path from the root node to a leaf node. An additional benefit of this structure is that it can be used to select negative samples from training data for each child node. In addition, to the best of our knowledge, no effort has been made to update the categorization model with new identified relation facts, which hinders the improvement of the extraction precision and recall. Although bootstrapping methods can contribute to improving the situation, they need additional calculation to evaluate the quality of extracted patterns or tuples when selecting new instances for next iterations. In this paper, we propose to do bootstrapped distant supervision to update the distant supervision model with new learned relation facts iteratively, and we can use scores directly gotten in the model to evaluate the quality of instances instead of additional calculation. As a result, we can further improve the extraction precision and recall. To save the time costs and manpower, we also propose an adaptive method by mapping function to choose the suitable thresholds for each iteration without manual choice rather than using the fixed thresholds. Experimental results conducted on three real datasets prove that our approach outperforms state-of-the-art approaches by reaching 12+% better extraction quality.
更多
查看译文
关键词
Distant Supervision,Bootstrapping,Relation Extraction,Information Extraction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要