Noise-Reduction for Automatically Transferred Relevance Judgments

EXPERIMENTAL IR MEETS MULTILINGUALITY, MULTIMODALITY, AND INTERACTION (CLEF 2022)(2022)

引用 0|浏览16
暂无评分
摘要
The TREC Deep Learning tracks used MS MARCO Version 1 as their official training data until 2020 and switched to Version 2 in 2021. For Version 2, all previously judged documents were re-crawled. Interestingly, in the track's 2021 edition, models trained on the new data were less effective than models trained on the old data. To investigate this phenomenon, we compare the predicted relevance probabilities of monoT5 for the two versions of the judged documents and find substantial differences. A further manual inspection reveals major content changes for some documents (e.g., the new version being off-topic). To analyze whether these changes may have contributed to the observed effectiveness drop, we conduct experiments with different document version selection strategies. Our results show that training a retrieval model on the "wrong" version can reduce the nDCG@10 by up to 75%.
更多
查看译文
关键词
MS MARCO, monoT5, Relevance transfer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要