Assessing the impact of Stemming Accuracy on Information Retrieval - A multilingual perspective.

Inf. Process. Manage.(2016)

引用 43|浏览56
暂无评分
摘要
We tested the quality of many stemmers for English, French, Spanish and Portuguese with both intrinsic and extrinsic metrics.We found that a correlation between the two types of measures does exist, but it is not as strong as one might have expected.The most accurate stemmer was not the one to have the biggest improvement in Information Retrieval, in none of the languages. The quality of stemming algorithms is typically measured in two different ways: (i) how accurately they map the variant forms of a word to the same stem; or (ii) how much improvement they bring to Information Retrieval systems. In this article, we evaluate various stemming algorithms, in four languages, in terms of accuracy and in terms of their aid to Information Retrieval. The aim is to assess whether the most accurate stemmers are also the ones that bring the biggest gain in Information Retrieval. Experiments in English, French, Portuguese, and Spanish show that this is not always the case, as stemmers with higher error rates yield better retrieval quality. As a byproduct, we also identified the most accurate stemmers and the best for Information Retrieval purposes.
更多
查看译文
关键词
Stemming,Information Retrieval,Evaluation,Multilingual
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要