The Best Published Result Is Random: Sequential Testing And Its Effect On Reported Effectiveness

SIGIR '15: The 38th International ACM SIGIR conference on research and development in Information Retrieval Santiago Chile August, 2015(2015)

引用 18|浏览32
暂无评分
摘要
Reusable test collections allow researchers to rapidly test different algorithms to find the one that works "best". But because of randomness in the topic sample, or in relevance judgments, or in interactions among system components, extreme results can be seen entirely due to chance, particularly when a collection becomes very popular. We argue that the best known published effectiveness on any given collection could be measured as much as 20% higher than its "true" intrinsic effectiveness, and that there are many other systems with lower measured effectiveness that could have substantially higher intrinsic effectiveness.
更多
查看译文
关键词
information retrieval,test collections,evaluation,statistical analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要