AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
Is it possible to use clustering algorithms to find homogenous groups of users? can clickthrough data be used to adapt a search engine not to a group of users, but to the properties of a particular document collection? In particular, the factory-settings of any off-the-shelf retr...

Optimizing search engines using clickthrough data

KDD, pp.133-142, (2002)

Cited by: 4950|Views249
EI

Abstract

This paper presents an approach to automatically optimizing the retrieval quality of search engines using clickthrough data. Intuitively, a good information retrieval system should present relevant documents high in the ranking, with less relevant documents following below. While previous approaches to learning retrieval functions from ex...More

Code:

Data:

Introduction
  • Which WWW page(s) does a user want to retrieve when he types some keywords into a search engine? There are typically thousands of pages that contain these words, but the user is interested in a much smaller subset.
  • Which WWW page(s) does a user want to retrieve when he types some keywords into a search engine?
  • One could ask the user for feedback.
  • If the authors knew the set of pages relevant to the user’s query, the authors could use this as training data for optimizing the retrieval function.
  • Experience shows that users are only rarely willing to give explicit feedback.
  • This paper argues that sufficient information is already hidden in the logfiles of WWW search engines.
Highlights
  • Which WWW page(s) does a user actually want to retrieve when he types some keywords into a search engine? There are typically thousands of pages that contain these words, but the user is interested in a much smaller subset
  • The experimental results show that the Ranking Support Vector Machine can successfully learn an improved retrieval function from clickthrough data
  • This paper presented an approach to mining logfiles of WWW search engines with the goal of improving their retrieval performance automatically
  • Experimental results show that the algorithm performs well in practice, successfully adapting the retrieval function of a meta-search engine to the preferences of a group of users
  • This paper opens a series of question regarding the use machine learning in search engines
  • Is it possible to use clustering algorithms to find homogenous groups of users? can clickthrough data be used to adapt a search engine not to a group of users, but to the properties of a particular document collection? In particular, the factory-settings of any off-the-shelf retrieval system are necessarily suboptimal for any particular collection
Methods
  • The following experiments verify whether the inferences drawn from the clickthrough data are justified, and whether the Ranking SVM can successfully use such partial preference data.
  • The experiment setup in the framework of a meta-search engine is described
  • It follow the results of an offline experiment and an online experiment.
  • The offline experiment is designed to verify that the Ranking SVM can learn a retrieval function maximizing Kendall’s τ on partial preference feedback.
  • Meta-search engines combine the results of several basic search engines without having a database of their own
  • Such a setup has several advantages.
  • The basic search engines provide a basis for comparison
Conclusion
  • DISCUSSION AND RELATED

    WORK

    The experimental results show that the Ranking SVM can successfully learn an improved retrieval function from clickthrough data.
  • Without any explicit feedback or manual parameter tuning, it has automatically adapted to the particular preferences of a group of ≈ 20 users
  • This improvement is not only a verification that the Ranking SVM can learn using partial ranking feedback, but an argument for personalizing retrieval functions.
  • Experimental results show that the algorithm performs well in practice, successfully adapting the retrieval function of a meta-search engine to the preferences of a group of users.
  • Shipping off-the-shelf search engines with learning capabilities would enable them to optimize their performance automatically after being installed in a company intranet
Tables
  • Table1: Average clickrank for three retrieval functions (“bxx”, “tfc” [<a class="ref-link" id="c23" href="#r23">23</a>] , and a “hand-tuned” strategy that uses different weights according to HTML tags) implemented in LASER. Rows correspond to the retrieval method used by LASER at query time; columns hold values from subsequent evaluation with other methods. Figures reported are means and two standard errors. The data for this table is taken from [<a class="ref-link" id="c5" href="#r5">5</a>]
  • Table2: Pairwise comparison of the learned retrieval function with Google, MSNSearch, and the non-learning meta-search ranking. The counts indicate for how many queries a user clicked on more links from the top of the ranking returned by the respective retrieval function
  • Table3: Features with largest and smallest weights as learned from the training data in the online experiment
Download tables as Excel
Funding
  • Presents an approach to automatically optimizing the retrieval quality of search engines using clickthrough data
  • The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking
  • Taking a Support Vector Machine approach, this paper presents a method for learning retrieval functions
  • Presents an approach to learning retrieval functions by analyzing which links the users click on in the presented ranking
Reference
  • R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley-Longman, Harlow, UK, May 1999.
    Google ScholarFindings
  • B. Bartell, G. Cottrell, and R. Belew. Automatic combination of multiple ranked retrieval systems. In Annual ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR), 1994.
    Google ScholarLocate open access versionFindings
  • D. Beeferman and A. Berger. Agglomerative clustering of a search engine query log. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2000.
    Google ScholarLocate open access versionFindings
  • B. E. Boser, I. M. Guyon, and V. N. Vapnik. A traininig algorithm for optimal margin classifiers. In D. Haussler, editor, Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, pages 144–152, 1992.
    Google ScholarLocate open access versionFindings
  • J. Boyan, D. Freitag, and T. Joachims. A machine learning architecture for optimizing web search engines. In AAAI Workshop on Internet Based Information Systems, August 1996.
    Google ScholarLocate open access versionFindings
  • W. Cohen, R. Shapire, and Y. Singer. Learning to order things. Journal of Artificial Intelligence Research, 10, 1999.
    Google ScholarLocate open access versionFindings
  • C. Cortes and V. N. Vapnik. Support–vector networks. Machine Learning Journal, 20:273–297, 1995.
    Google ScholarLocate open access versionFindings
  • K. Crammer and Y. Singer. Pranking with ranking. In Advances in Neural Information Processing Systems (NIPS), 2001.
    Google ScholarLocate open access versionFindings
  • Y. Freund, R. Iyer, R. Shapire, and Y. Singer. An efficient boosting algorithm for combining preferences. In International Conference on Machine Learning (ICML), 1998.
    Google ScholarLocate open access versionFindings
  • N. Fuhr. Optimum polynomial retrieval functions based on the probability ranking principle. ACM Transactions on Information Systems, 7(3):183–204, 1989.
    Google ScholarLocate open access versionFindings
  • N. Fuhr, S. Hartmann, G. Lustig, M. Schwantner, K. Tzeras, and G. Knorz. Air/x - a rule-based multistage indexing system for large subject fields. In RIAO, pages 606–623, 1991.
    Google ScholarLocate open access versionFindings
  • R. Herbrich, T. Graepel, and K. Obermayer. Large margin rank boundaries for ordinal regression. In Advances in Large Margin Classifiers, pages 115–132. MIT Press, Cambridge, MA, 2000.
    Google ScholarLocate open access versionFindings
  • K. Hoffgen, H. Simon, and K. van Horn. Robust trainability of single neurons. Journal of Computer and System Sciences, 50:114–125, 1995.
    Google ScholarLocate open access versionFindings
  • T. Joachims. Making large-scale SVM learning practical. In B. Scholkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods - Support Vector Learning, chapter 11. MIT Press, Cambridge, MA, 1999.
    Google ScholarLocate open access versionFindings
  • T. Joachims. Learning to Classify Text Using Support Vector Machines – Methods, Theory, and Algorithms. Kluwer, 2002.
    Google ScholarFindings
  • T. Joachims. Unbiased evaluation of retrieval quality using clickthrough data. Technical report, Cornell University, Department of Computer Science, 2002. http://www.joachims.org.
    Findings
  • T. Joachims, D. Freitag, and T. Mitchell. WebWatcher: a tour guide for the world wide web. In Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), volume 1, pages 770 – 777. Morgan Kaufmann, 1997.
    Google ScholarLocate open access versionFindings
  • J. Kemeny and L. Snell. Mathematical Models in the Social Sciences. Ginn & Co, 1962.
    Google ScholarFindings
  • M. Kendall. Rank Correlation Methods. Hafner, 1955.
    Google ScholarLocate open access versionFindings
  • H. Lieberman. Letizia: An agent that assists Web browsing. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence (IJCAI ’95), Montreal, Canada, 1995. Morgan Kaufmann.
    Google ScholarLocate open access versionFindings
  • A. Mood, F. Graybill, and D. Boes. Introduction to the Theory of Statistics. McGraw-Hill, 3 edition, 1974.
    Google ScholarFindings
  • L. Page and S. Brin. Pagerank, an eigenvector based ranking approach for hypertext. In 21st Annual ACM/SIGIR International Conference on Research and Development in Information Retrieval, 1998.
    Google ScholarLocate open access versionFindings
  • G. Salton and C. Buckley. Term weighting approaches in automatic text retrieval. Information Processing and Management, 24(5):513–523, 1988.
    Google ScholarLocate open access versionFindings
  • C. Silverstein, M. Henzinger, H. Marais, and M. Moricz. Analysis of a very large altavista query log. Technical Report SRC 1998-014, Digital Systems Research Center, 1998.
    Google ScholarFindings
  • V. Vapnik. Statistical Learning Theory. Wiley, Chichester, GB, 1998.
    Google ScholarFindings
  • Y. Yao. Measuring retrieval effectiveness based on user preference of documents. Journal of the American Society for Information Science, 46(2):133–145, 1995.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科