ConPhrase: Enhancing Context-Aware Phrase Mining From Text Corpora

IEEE Transactions on Knowledge and Data Engineering(2023)

引用 0|浏览8
暂无评分
摘要
Phrase mining is an essential step when transforming unstructured text into structured information, in which the aim is to extract high-quality phrases from given corpora automatically. Existing statistics-based methods have achieved state-of-the-art performance on this task. However, such methods often rely heavily on statistical signals to extract quality phrases, ignoring the effect of contextual information. In this paper, we propose a novel context-aware method, called ConPhrase, for quality phrase mining under distantly supervised settings. Specifically, ConPhrase formulates phrase mining as a sequence labeling problem by considering local contextual information, and also incorporates distant supervision methods to automatically generate labeled data. It comprises two modules designed to tackle global information scarcity and noisy data filtration: 1) a topic-aware phrase recognition network that incorporates domain-related topic information into word representation learning to identify quality phrases effectively; 2) an instance selection network that focuses on choosing correct sentences with reinforcement learning for improving the prediction performance of the phrase recognition network. Moreover, we also propose an extended variant of ConPhrase, called ConPhrase+, that further enhances phrase recognition by utilizing document-level contextual information across sentences within the entire document. Experimental results show that contextual information is indispensable for phrase mining and our context-aware methods perform significantly better than state-of-the-art approaches on three publicly available datasets.
更多
查看译文
关键词
Data mining,Noise measurement,Relational databases,Semantics,Labeling,Task analysis,Hypertension,Information extraction,phrase mining,quality phrase recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要