Relaxing Orthogonality Assumption in Conceptual Text Document Similarity.

DOCENG(2016)

引用 0|浏览26
暂无评分
摘要
ABSTRACTBy reflecting the degree of proximity or remoteness of documents, similarity measure plays the key role in text analytics. Traditional measures, e.g. cosine similarity, assume that documents are represented in an orthogonal space formed by words as dimensions. Words are considered independent from each other and document similarity is computed based on lexical overlap. This assumption is also made in the bag of concepts representation of documents while the space is formed by concepts. This paper proposes new semantic similarity measures without relying on the orthogonality assumption. By employing Wikipedia as an external resource, we introduce five similarity measures using concept-concept relatedness. Experimental results on real text datasets reveal that eliminating the orthogonality assumption improves the quality of text clustering algorithms.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要