Facilitating pattern discovery for relation extraction with semantic-signature-based clustering.

CIKM '11: International Conference on Information and Knowledge Management Glasgow Scotland, UK October, 2011(2011)

引用 13|浏览22
暂无评分
摘要
Hand-crafted textual patterns have been the mainstay device of practical relation extraction for decades. However, there has been little work on reducing the manual effort involved in the discovery of effective textual patterns for relation extraction. In this paper, we propose a clustering-based approach to facilitate the pattern discovery for relation extraction. Specifically, we define the notion of semantic signature to represent the most salient features of a textual fragment. We then propose a novel clustering algorithm based on semantic signature, S2C, and its enhancement S2C+. Experiments on two real-world data sets show that, when compared with k-means clustering, S2C and S2C+ are at least an order of magnitude faster, while generating high quality clusters that are at least comparable to the best clusters generated by k-means without requiring any manual tuning. Finally, a user study confirms that our clustering-based approach can indeed help users discover effective textual patterns for relation extraction with only a fraction of the manual effort required by the conventional approach.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要