AI helps you reading Science
AI generates interpretation videos
AI extracts and analyses the key points of the paper to generate videos automatically
AI parses the academic lineage of this thesis
AI extracts a summary of this paper
Is it possible to use clustering algorithms to find homogenous groups of users? can clickthrough data be used to adapt a search engine not to a group of users, but to the properties of a particular document collection? In particular, the factory-settings of any off-the-shelf retr...
Optimizing search engines using clickthrough data
KDD, pp.133-142, (2002)
This paper presents an approach to automatically optimizing the retrieval quality of search engines using clickthrough data. Intuitively, a good information retrieval system should present relevant documents high in the ranking, with less relevant documents following below. While previous approaches to learning retrieval functions from ex...More
PPT (Upload PPT)
- Which WWW page(s) does a user want to retrieve when he types some keywords into a search engine? There are typically thousands of pages that contain these words, but the user is interested in a much smaller subset.
- Which WWW page(s) does a user want to retrieve when he types some keywords into a search engine?
- One could ask the user for feedback.
- If the authors knew the set of pages relevant to the user’s query, the authors could use this as training data for optimizing the retrieval function.
- Experience shows that users are only rarely willing to give explicit feedback.
- This paper argues that sufficient information is already hidden in the logfiles of WWW search engines.
- Which WWW page(s) does a user actually want to retrieve when he types some keywords into a search engine? There are typically thousands of pages that contain these words, but the user is interested in a much smaller subset
- The experimental results show that the Ranking Support Vector Machine can successfully learn an improved retrieval function from clickthrough data
- This paper presented an approach to mining logfiles of WWW search engines with the goal of improving their retrieval performance automatically
- Experimental results show that the algorithm performs well in practice, successfully adapting the retrieval function of a meta-search engine to the preferences of a group of users
- This paper opens a series of question regarding the use machine learning in search engines
- Is it possible to use clustering algorithms to find homogenous groups of users? can clickthrough data be used to adapt a search engine not to a group of users, but to the properties of a particular document collection? In particular, the factory-settings of any off-the-shelf retrieval system are necessarily suboptimal for any particular collection
- The following experiments verify whether the inferences drawn from the clickthrough data are justified, and whether the Ranking SVM can successfully use such partial preference data.
- The experiment setup in the framework of a meta-search engine is described
- It follow the results of an offline experiment and an online experiment.
- The offline experiment is designed to verify that the Ranking SVM can learn a retrieval function maximizing Kendall’s τ on partial preference feedback.
- Meta-search engines combine the results of several basic search engines without having a database of their own
- Such a setup has several advantages.
- The basic search engines provide a basis for comparison
- DISCUSSION AND RELATED
The experimental results show that the Ranking SVM can successfully learn an improved retrieval function from clickthrough data.
- Without any explicit feedback or manual parameter tuning, it has automatically adapted to the particular preferences of a group of ≈ 20 users
- This improvement is not only a verification that the Ranking SVM can learn using partial ranking feedback, but an argument for personalizing retrieval functions.
- Experimental results show that the algorithm performs well in practice, successfully adapting the retrieval function of a meta-search engine to the preferences of a group of users.
- Shipping off-the-shelf search engines with learning capabilities would enable them to optimize their performance automatically after being installed in a company intranet
- Table1: Average clickrank for three retrieval functions (“bxx”, “tfc” [<a class="ref-link" id="c23" href="#r23">23</a>] , and a “hand-tuned” strategy that uses different weights according to HTML tags) implemented in LASER. Rows correspond to the retrieval method used by LASER at query time; columns hold values from subsequent evaluation with other methods. Figures reported are means and two standard errors. The data for this table is taken from [<a class="ref-link" id="c5" href="#r5">5</a>]
- Table2: Pairwise comparison of the learned retrieval function with Google, MSNSearch, and the non-learning meta-search ranking. The counts indicate for how many queries a user clicked on more links from the top of the ranking returned by the respective retrieval function
- Table3: Features with largest and smallest weights as learned from the training data in the online experiment
- Presents an approach to automatically optimizing the retrieval quality of search engines using clickthrough data
- The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking
- Taking a Support Vector Machine approach, this paper presents a method for learning retrieval functions
- Presents an approach to learning retrieval functions by analyzing which links the users click on in the presented ranking
- R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley-Longman, Harlow, UK, May 1999.
- B. Bartell, G. Cottrell, and R. Belew. Automatic combination of multiple ranked retrieval systems. In Annual ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR), 1994.
- D. Beeferman and A. Berger. Agglomerative clustering of a search engine query log. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2000.
- B. E. Boser, I. M. Guyon, and V. N. Vapnik. A traininig algorithm for optimal margin classifiers. In D. Haussler, editor, Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, pages 144–152, 1992.
- J. Boyan, D. Freitag, and T. Joachims. A machine learning architecture for optimizing web search engines. In AAAI Workshop on Internet Based Information Systems, August 1996.
- W. Cohen, R. Shapire, and Y. Singer. Learning to order things. Journal of Artificial Intelligence Research, 10, 1999.
- C. Cortes and V. N. Vapnik. Support–vector networks. Machine Learning Journal, 20:273–297, 1995.
- K. Crammer and Y. Singer. Pranking with ranking. In Advances in Neural Information Processing Systems (NIPS), 2001.
- Y. Freund, R. Iyer, R. Shapire, and Y. Singer. An efficient boosting algorithm for combining preferences. In International Conference on Machine Learning (ICML), 1998.
- N. Fuhr. Optimum polynomial retrieval functions based on the probability ranking principle. ACM Transactions on Information Systems, 7(3):183–204, 1989.
- N. Fuhr, S. Hartmann, G. Lustig, M. Schwantner, K. Tzeras, and G. Knorz. Air/x - a rule-based multistage indexing system for large subject fields. In RIAO, pages 606–623, 1991.
- R. Herbrich, T. Graepel, and K. Obermayer. Large margin rank boundaries for ordinal regression. In Advances in Large Margin Classifiers, pages 115–132. MIT Press, Cambridge, MA, 2000.
- K. Hoffgen, H. Simon, and K. van Horn. Robust trainability of single neurons. Journal of Computer and System Sciences, 50:114–125, 1995.
- T. Joachims. Making large-scale SVM learning practical. In B. Scholkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods - Support Vector Learning, chapter 11. MIT Press, Cambridge, MA, 1999.
- T. Joachims. Learning to Classify Text Using Support Vector Machines – Methods, Theory, and Algorithms. Kluwer, 2002.
- T. Joachims. Unbiased evaluation of retrieval quality using clickthrough data. Technical report, Cornell University, Department of Computer Science, 2002. http://www.joachims.org.
- T. Joachims, D. Freitag, and T. Mitchell. WebWatcher: a tour guide for the world wide web. In Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), volume 1, pages 770 – 777. Morgan Kaufmann, 1997.
- J. Kemeny and L. Snell. Mathematical Models in the Social Sciences. Ginn & Co, 1962.
- M. Kendall. Rank Correlation Methods. Hafner, 1955.
- H. Lieberman. Letizia: An agent that assists Web browsing. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence (IJCAI ’95), Montreal, Canada, 1995. Morgan Kaufmann.
- A. Mood, F. Graybill, and D. Boes. Introduction to the Theory of Statistics. McGraw-Hill, 3 edition, 1974.
- L. Page and S. Brin. Pagerank, an eigenvector based ranking approach for hypertext. In 21st Annual ACM/SIGIR International Conference on Research and Development in Information Retrieval, 1998.
- G. Salton and C. Buckley. Term weighting approaches in automatic text retrieval. Information Processing and Management, 24(5):513–523, 1988.
- C. Silverstein, M. Henzinger, H. Marais, and M. Moricz. Analysis of a very large altavista query log. Technical Report SRC 1998-014, Digital Systems Research Center, 1998.
- V. Vapnik. Statistical Learning Theory. Wiley, Chichester, GB, 1998.
- Y. Yao. Measuring retrieval effectiveness based on user preference of documents. Journal of the American Society for Information Science, 46(2):133–145, 1995.