End-to-End Query Term Weighting.

Karan Samel,Cheng Li 0012,Weize Kong,Tao Chen 0008,Mingyang Zhang 0001, Shaleen Kumar Gupta,Swaraj Khadanga,Wensong Xu, Xingyu Wang, Kashyap Kolipaka,Michael Bendersky,Marc Najork

KDD（2023）

引用 0|浏览64

暂无评分

摘要

Bag-of-words based lexical retrieval systems are still the most commonly used methods for real-world search applications. Recently deep learning methods have shown promising results to improve this retrieval performance but are expensive to run in an online fashion, non-trivial to integrate into existing production systems, and might not generalize well in out-of-domain retrieval scenarios. Instead, we build on top of lexical retrievers by proposing a Term Weighting BERT (TW-BERT) model. TW-BERT learns to predict the weight for individual n-gram (e.g., uni-grams and bi-grams) query input terms. These inferred weights and terms can be used directly by a retrieval system to perform a query search. To optimize these term weights, TW-BERT incorporates the scoring function used by the search engine, such as BM25, to score query-document pairs. Given sample query-document pairs we can compute a ranking loss over these matching scores, optimizing the learned query term weights in an end-to-end fashion. Aligning TW-BERT with search engine scorers minimizes the changes needed to integrate it into existing production applications, whereas existing deep learning based search methods would require further infrastructure optimization and hardware requirements. The learned weights can be easily utilized by standard lexical retrievers and by other retrieval techniques such as query expansion. We show that TW-BERT improves retrieval performance over strong term weighting baselines within MSMARCO and in out-of-domain retrieval on TREC datasets.

查看译文

关键词

Information Retrieval,Query Weighting,Language Models

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要