Topic Modeling And Word Sense Disambiguation On The Ancora Corpus

Procesamiento Del Lenguaje Natural(2015)

引用 0|浏览41
暂无评分
摘要
In this paper we present an approach to Word Sense Disambiguation based on Topic Modeling (LDA). Our approach consists of two different steps, where first a binary classifier is applied to decide whether the most frequent sense applies or not, and then another classifier deals with the non most frequent sense cases. An exhaustive evaluation is performed on the Spanish corpus Ancora, to analyze the performance of our twostep system and the impact of the context and the different parameters in the system. Our best experiment reaches an accuracy of 74.53, which is 6 points over the highest baseline. All the software developed for these experiments has been made freely available, to enable reproducibility and allow the reusage of the software.
更多
查看译文
关键词
Topic Modeling,LDA,Most Frequent Sense,WSD,Ancora corpus
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要