On the Effect of Low-Ranked Documents: A New Sampling Function for Selective Gradient Boosting

Claudio Lucchese,Federico Marcuzzi,Salvatore Orlando

38TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2023（2023）

引用 0|浏览9

暂无评分

摘要

Learning to Rank is the task of learning a ranking function from a set of query-documents pairs. Generally, documents within a query are thousands but not all documents are informative for the learning phase. Different strategies were designed to select the most informative documents from the training set. However, most of them focused on reducing the size of the training set to speed up the learning phase, sacrificing effectiveness. A first attempt in this direction was achieved by Selective Gradient Boosting a learning algorithm that makes use of customisable sampling strategy to train effective ranking models. In this work, we propose a new sampling strategy called High_Low_Sampl for selecting negative examples applicable to Selective Gradient Boosting, without compromising model effectiveness. The proposed sampling strategy allows Selective Gradient Boosting to compose a new training set by selecting from the original one three document classes: the positive examples, high-ranked negative examples and low-ranked negative examples. The resulting dataset aims at minimizing the mis-ranking risk, i.e., enhancing the discriminative power of the learned model and maintaining generalisation to unseen instances. We demonstrated through an extensive experimental analysis on publicly available datasets, that the proposed selection algorithm is able to make the most of the negative examples within the training set and leads to models capable of obtaining statistically significant improvements in terms of NDCG, compared to the state of the art.

查看译文

关键词

Learning to Rank,Sampling methods,Gradient Boosting

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要