Accurate keyphrase extraction by discriminating overlapping phrases

Journal of Information Science(2014)

引用 25|浏览28
暂无评分
摘要
In this paper we define the document phrase maximality index DPM-index, a new measure to discriminate overlapping keyphrase candidates in a text document. As an application we developed a supervised learning system that uses 18 statistical features, among them the DPM-index and five other new features. We experimentally compared our results with those of 21 keyphrase extraction methods on SemEval-2010/Task-5 scientific articles corpus. When all the systems extract 10 keyphrases per document, our method enhances by 13% the F-score of the best system. In particular, the DPM-index feature increases the F-score of our keyphrase extraction system by a rate of 9%. This makes the DPM-index contribution comparable to that of the well-known TFIDF measure on such a system.
更多
查看译文
关键词
information extraction,keyphrase extraction,scientific digital libraries,text mining
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要