GS 2 P: a generative pre-trained learning to rank model with over-parameterization for web-scale search

Machine Learning(2024)

引用 0|浏览44
暂无评分
摘要
While learning to rank (LTR) is widely employed in web searches to prioritize pertinent webpages from the retrieved contents based on input queries, traditional LTR models stumble over two principal stumbling blocks leading to subpar performance: (1) the lack of well-annotated query-webpage pairs with ranking scores to cover search queries of various popularity, debilitating their coverage of search queries across the popularity spectrum, and (2) ill-trained models that are incapable of inducing generalized representations for LTR, culminating in overfitting. To tackle above challenges, we proposed a Generative Semi - Supervised Pre - trained (GS ^2 P) Learning to Rank model. Specifically, GS ^2 P first generates pseudo-labels for the unlabeled samples using tree-based LTR models after a series of co-training procedures, then learns the representations of query-webpage pairs with self-attentive transformers via both discriminative (LTR) and generative (denoising autoencoding for reconstruction) losses. Finally, GS ^2 P boosts the performance of LTR through incorporating Random Fourier Features to over-parameterize the models into “interpolating regime”, so as to enjoy the further descent of generalization errors with learned representations. We conduct extensive offline experiments on a publicly available dataset and a real-world dataset collected from a large-scale search engine. The results show that GS ^2 P can achieve the best performance on both datasets, compared to baselines. We also deploy GS ^2 P at a large-scale web search engine with realistic traffic, where we can still observe significant improvement in real-world applications. GS ^2 P performs consistently in both online and offline experiments.
更多
查看译文
关键词
Learning to rank,Data reconstruction,Pre-training,Web search
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要