Enhancing Sparse Retrieval via Unsupervised Learning

ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL IN THE ASIA PACIFIC REGION, SIGIR-AP 2023(2023)

引用 0|浏览1
暂无评分
摘要
Recent work has shown that neural retrieval models excel at text ranking tasks in a supervised setting when given large amounts of manually labeled training data. However, it remains an open question how to train unsupervised retrieval models that are more effective than baselines such as BM25. While some progress has been made in unsupervised dense retrieval models within a bi-encoder architecture, unsupervised sparse retrieval models remain unexplored. We propose BM26, to our knowledge the first such model, which is trained in an unsupervised manner without the need for any human relevance judgments. Evaluations with multiple test collections show that BM26 performs on par with BM25 and outperforms Contriever, the current state-of-the-art unsupervised dense retriever. We further demonstrate two promising avenues to enhance lexical retrieval: First, we can combine BM25 and BM26 using simple vector concatenation to yield an unsupervised hybrid BM51 model that significantly improves over BM25 alone. Second, we can enhance supervised sparse models such as SPLADE with improved initialization using BM26, yielding significant improvements in in-domain and zero-shot retrieval effectiveness.
更多
查看译文
关键词
Neural IR,Sparse Retrieval,Unsupervised Learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要