Corpus-Based Relation Extraction by Identifying and Refining Relation Patterns

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT IV(2023)

引用 0|浏览19
暂无评分
摘要
Automated relation extraction without extensive human-annotated data is a crucial yet challenging task in text mining. Existing studies typically use lexical patterns to label a small set of high-precision relation triples and then employ distributional methods to enhance detection recall. This precision-first approach works well for common relation types but struggles with unconventional and infrequent ones. In this work, we propose a recall-first approach that first leverages high-recall patterns (e.g., a per:siblings relation normally requires both the head and tail entities in the person type) to provide initial candidate relation triples with weak labels and then clusters these candidate relation triples in a latent spherical space to extract high-quality weak supervisions. Specifically, we present a novel framework, RClus, where each relation triple is represented by its head/tail entity type and the shortest dependency path between the entity mentions. RClus first applies high-recall patterns to narrow down each relation type's candidate space. Then, it embeds candidate relation triples in a latent space and conducts spherical clustering to further filter out noisy candidates and identify high-qualityweakly-labeled triples. Finally, RClus leverages the above-obtained triples to prompttune a pre-trained language model and utilizes it for improved extraction coverage. We conduct extensive experiments on three public datasets and demonstrate that RClus outperforms the weakly-supervised baselines by a large margin and achieves generally better performance than fully-supervised methods in low-resource settings.
更多
查看译文
关键词
Relation Extraction,Weak Supervision,Latent Space Clustering
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要