Corpus-based and knowledge-based measures of text semantic similarity

AAAI(2006)

引用 1793|浏览464
暂无评分
摘要
This paper presents a method for measuring the semantic similarity of texts, using corpus-based and knowledge-based measures of similarity. Previous work on this problem has focused mainly on either large documents (e.g. text classification, information retrieval) or individual words (e.g. synonymy tests). Given that a large fraction of the information available today, on the Web and elsewhere, consists of short text snippets (e.g. abstracts of scientific documents, imagine captions, product descriptions), in this paper we focus on measuring the semantic similarity of short texts. Through experiments performed on a paraphrase data set, we show that the semantic similarity method out-performs methods based on simple lexical matching, resulting in up to 13% error rate reduction with respect to the traditional vector-based similarity metric.
更多
查看译文
关键词
semantic similarity,text semantic similarity,short text,information retrieval,text classification,large document,error rate reduction,large fraction,traditional vector-based similarity metric,knowledge-based measure,short text snippet,semantic similarity method out-performs,error rate,knowledge base
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要