Diversified Relevance Feedback

Matt Crane

SIGIR '13: The 36th International ACM SIGIR conference on research and development in Information Retrieval Dublin Ireland July, 2013（2013）

引用 1|浏览42

暂无评分

摘要

The need for a search engine to deal with ambiguous queries has been known for a long time (diversification). However, it is only recently that this need has become a focus within information retrieval research. How to respond to indications that a result is relevant to a query (relevance feedback) has also been a long focus of research. When thinking about the results for a query as being clustered by topic, these two areas of information retrieval research appear to be opposed to each other. Interestingly though, they both appear to improve the performance of search engines, raising the question: they can be combined or made to work with each other?When presented with an ambiguous query there are a number of techniques that can be employed to better select results. The primary technique being researched now is diversification, which aims to populate the results with a set of documents that cover different possible interpretations for the query, while maintaining a degree of relevance, as determined by the search engine. For example, given a query of "java" it is unclear whether the user, without any other information, means the programming language, the coffee, the island of Indonesia or a multitude of other meanings.In order to do this the assumption that documents are independent of each other when assessing potential relevance has to be broken. That is, a documents relevance, as calculated by the search engine, is no longer dependent only on the query, but also the other documents that have been selected. How a document is identified as being similar to previously selected documents, and the trade off between estimated relevance and topic coverage are current areas for information retrieval research.For unambiguous queries, or for search engines that do not perform diversification, it is possible to improve the results selected by reacting to information identifying a given result as truly relevant or not. This mechanism is known as relevance feedback. The most common response to relevance feedback is to investigate the documents for their most content-bearing terms, and either add, or subtract, their influence to a newly formed query which is then rerun on the remaining documents to re-order them.There has been a scant amount of research into the combination of these methods. However, Carbonell et al. [1] show that an initially diverse result set can provide a better approach for identifying the topic a user is interested in for a relevance feedback style approach. This approach was further extended by Raman et al. [4].An important aspect of relevance feedback is the selection of documents to use. In the 2008 TREC relevance feedback track, Meij et al. [3] generated a diversified result set which outperformed other rankings as a source of feedback documents.The use of pseudo-relevance feedback (assuming the top ranked documents are relevant) to extract sub-topics for use in diversification was explored by Santos et al. [5]. These previous approaches suggest that these two ideas are more linked than expected.The ATIRE search engine [6] will be used to further explore the relationship between diversification and relevance feedback. ATIRE was selected because it is developed locally, and is designed to be small and fast. ATIRE also produces a competitive baseline, which would have placed 6th in the 2011 TREC diversity task while performing no diversification and index-time spam filtering [2], although we concede this is not equivalent to submitting a run.

查看译文

关键词

Diversification,Relevance feedback

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要