From Greedy Selection to Exploratory Decision-Making: Diverse Ranking with Policy-Value Networks.

SIGIR(2018)

引用 38|浏览44
暂无评分
摘要
The goal of search result diversification is to select a subset of documents from the candidate set to satisfy as many different subtopics as possible. In general, it is a problem of subset selection and selecting an optimal subset of documents is NP-hard. Existing methods usually formalize the problem as ranking the documents with greedy sequential document selection. At each of the ranking position the document that can provide the largest amount of additional information is selected. It is obvious that the greedy selections inevitably produce suboptimal rankings. In this paper we propose to partially alleviate the problem with a Monte Carlo tree search (MCTS) enhanced Markov decision process (MDP), referred to as M$^2$Div. In M$^2$Div, the construction of diverse ranking is formalized as an MDP process where each action corresponds to selecting a document for one ranking position. Given an MDP state which consists of the query, selected documents, and candidates, a recurrent neural network is utilized to produce the policy function for guiding the document selection and the value function for predicting the whole ranking quality. The produced raw policy and value are then strengthened with MCTS through exploring the possible rankings at the subsequent positions, achieving a better search policy for decision-making. Experimental results based on the TREC benchmarks showed that M$^2$Div can significantly outperform the state-of-the-art baselines based on greedy sequential document selection, indicating the effectiveness of the exploratory decision-making mechanism in M$^2$Div.
更多
查看译文
关键词
Diverse ranking,Markov decision process,Monte Carlo tree search
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要