Computing semantic relatedness using Wikipedia-based explicit semantic analysis

IJCAI（2007）

引用 3021|浏览447

暂无评分

摘要

Computing semantic relatedness of natural language texts requires access to vast amounts of common-sense and domain-specific world knowledge. We propose Explicit Semantic Analysis (ESA), a novel method that represents the meaning of texts in a high-dimensional space of concepts derived from Wikipedia. We use machine learning techniques to explicitly represent the meaning of any text as a weighted vector of Wikipedia-based concepts. Assessing the relatedness of texts in this space amounts to comparing the corresponding vectors using conventional metrics (e.g., cosine). Compared with the previous state of the art, using ESA results in substantial improvements in correlation of computed relatedness scores with human judgments: from r = 0.56 to 0.75 for individual words and from r = 0.60 to 0.72 for texts. Importantly, due to the use of natural concepts, the ESA model is easy to explain to human users.

查看译文

关键词

wikipedia-based explicit semantic analysis,human user,computed relatedness score,space amount,high-dimensional space,natural language text,esa model,computing semantic relatedness,natural concept,human judgment,esa result,machine learning,explicit semantic analysis,semantic relatedness,natural language,computational semantics

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要