Repeatability Corner Cases in Document Ranking: The Impact of Score Ties

arXiv: Information Retrieval(2018)

引用 23|浏览15
暂无评分
摘要
Document ranking experiments should be repeatable. However, the interaction between multi-threaded indexing and score ties during retrieval may yield non-deterministic rankings, making repeatability not as trivial as one might imagine. In the context of the open-source Lucene search engine, score ties are broken by internal document ids, which are assigned at index time. Due to multi-threaded indexing, which makes experimentation with large modern document collections practical, internal document ids are not assigned consistently between different index instances of the same collection, and thus score ties are broken unpredictably. This short paper examines the effectiveness impact of such score ties, quantifying the variability that can be attributed to this phenomenon. The obvious solution to this non-determinism and to ensure repeatable document ranking is to break score ties using external collection document ids. This approach, however, comes with measurable efficiency costs due to the necessity of consulting external identifiers during query evaluation.
更多
查看译文
关键词
Lucene, query evaluation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要