Biasing web search results for topic familiarity
CIKM(2005)
摘要
Depending on a web searcher's familiarity with a query's target topic, it may be more appropriate to show her introductory or advanced documents. The TREC HARD [1] track defined topic familiarity as meta-data associated with a user's query. We instead define a user-independent and query-independent model of topic-familiarity required to read a document, so it can be matched to a given user in response to a query. An introductory web page is defined as A web page that doesn't presuppose any background knowledge of the topic it is on, and to an extent introduces or defines the key terms in the topic. while an advanced web page is defined as A web page that assumes sufficient background knowledge of the topic it is on, and familiarity with the key technical/ important terms in the topic, and potentially builds on them. We develop a method for biasing the initial mix of documents returned by a search engine to increase the number of documents of desired familiarity level up to position 5, and up to position 10. Our method involves building a supervised text classifier, incorporating features based on reading level, the distribution of stop-words in the text, and non-text features such as average line-length. Using this familiarity classifier, we achieve statistically significant improvements at reranking the result set to show introductory documents higher up the ranked list. Our classifier can be seamlessly integrated into current search engine technology without involving any major modifications to existing architectures.
更多查看译文
关键词
supervised text classifier,familiarity classifier,introductory web page,web page,familiarity level,target topic,web searcher,topic familiarity,biasing web search result,introductory document,advanced web page,statistical significance,web pages,personalization,search engine
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络