Mining Search Engine Clickthrough Log for Matching N-gram Features.
EMNLP '09: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2(2009)
摘要
User clicks on a URL in response to a query are extremely useful predictors of the URL's relevance to that query. Exact match click features tend to suffer from severe data sparsity issues in web ranking. Such sparsity is particularly pronounced for new URLs or long queries where each distinct query-url pair will rarely occur. To remedy this, we present a set of straightforward yet informative query-url n -gram features that allows for generalization of limited user click data to large amounts of unseen query-url pairs. The method is motivated by techniques leveraged in the NLP community for dealing with unseen words. We find that there are interesting regularities across queries and their preferred destination URLs; for example, queries containing "form" tend to lead to clicks on URLs containing "pdf". We evaluate our set of new query-url features on a web search ranking task and obtain improvements that are statistically significant at a p -value < 0.0001 level over a strong baseline with exact match clickthrough features.
更多查看译文
关键词
distinct query-url pair,informative query-url n-gram feature,new query-url feature,unseen query-url pair,new URLs,preferred destination URLs,exact match click feature,limited user click data,long query,severe data,N-gram feature,mining search engine clickthrough
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络