A Quantitative Analysis of Noise Impact on Document Ranking.

Edward Giamphy, Kévin Sanchis, Gohar Dashyan,Jean-Loup Guillaume,Ahmed Hamdi, Lilian Sanselme,Antoine Doucet

2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC)(2023)

引用 0|浏览0
暂无评分
摘要
After decades of massive digitization, a sub-stantial amount of documents exists in digital form. The accessibility of these documents is strongly impacted by the quality of document indexing. Most of these documents are indexed in noisy versions that include numerous errors. The noise can be due to manual input mistakes or optical character recognition process and results in errors like spelling mistakes, missing characters, and others. This paper presents a study of the impact of noise on document ranking, an essential task in natural language processing (NLP) with wide-ranging practical applications. We provide a deep and quantitative analysis of the impact of recognition errors on document ranking by testing two popular ranking models on several noisy versions of a subset of the MS MARCO passage ranking dataset, with various levels and types of noise. Our study provides insights into the challenges of document ranking under noisy conditions and advocates for developing ranking models that are more robust to noise.
更多
查看译文
关键词
Information Retrieval,Document Ranking,Indexing,Noise,OCR Errors,Natural Language Processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要