Constructing Reliable Gradient Exploration for Online Learning to Rank

ACM International Conference on Information and Knowledge Management(2016)

引用 30|浏览506
暂无评分
摘要
With the rapid development of information retrieval (IR) systems, online learning to rank (OLR) approaches, which allow retrieval systems to automatically learn best parameters from user interactions, have attracted great research interests in recent years. In OLR, the algorithms usually need to explore some uncertain retrieval results for updating current parameters meanwhile guaranteeing to produce quality retrieval results by exploiting what have already been learned, and the final retrieval results is an interleaved list from both exploratory and exploitative results. However, existing OLR algorithms perform exploration based on either only one stochastic direction or multiple randomly selected stochastic directions, which always involve large variance and uncertainty into the exploration, and may further harm the retrieval quality. Moreover, little historical exploration knowledge is considered when conducting current exploration. In this paper, we propose two online learning to rank algorithms that improve the reliability of the exploration by constructing robust exploratory directions. First, we describe a \textit{Dual-Point Dueling Bandit Gradient Descent} (DP-DBGD) approach with a \textit{Contextual Interleaving} (CI) method. In particular, the exploration of \textit{DP-DBGD} is carefully conducted via two opposite stochastic directions and the proposed \textit{CI} method constructs a qualified interleaved retrieval result list by taking historical explorations into account. Second, we introduce a \textit{Multi-Point Deterministic Gradient Descent} (MP-DGD) method that constructs a set of deterministic standard unit basis vectors for exploration. In \textit{MP-DGD}, each basis direction will be explored and the parameter updating is performed by walking along the combination of exploratory winners from the basis vectors. We conduct experiments on several datasets and show that both \textit{DP-DBGD} and \textit{MP-DGD} improve the online learning to rank performance over 10\% compared with baseline methods.
更多
查看译文
关键词
Online Learning to Rank,Dual-Point Dueling Bandit Gradient Descent,Multi-Point Deterministic Gradient Descent,Interleaved Comparison
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要