Efficient query processing in large web search engines

Efficient query processing in large web search engines(2006)

引用 22|浏览23
暂无评分
摘要
Large web search engines have to answer thousands of queries per second in interactive response time. Due to the sizes of the data sets involved, often in the range of multiple terabytes, a single query may require the processing of hundreds of megabytes or more of index data. Thus, performance of query processing becomes a critical issue of Web search engines. To keep up with this immense workload, large search engines employ clusters of hundreds or thousands of machines, and a number of techniques such as caching, index compression, and index and query pruning are used to increase query throughput and decrease overall cost. In this thesis, we investigate three techniques: index compression, caching, and query pruning. We demonstrate how these techniques may be used effectively to increase the throughput of query processing in Web search engines. First, we revisit several well known compression schemes for inverted index structures and compare their compression ratios, decoding overheads and impacts on performance of query processing. Next, we present a three-level caching architecture (result cache, list cache, and a new projection cache) and several cache replacement policies are studied on different levels. Finally, we propose query pruning algorithms that use the global ordering (e.g. Pagerank) on the Web for optimized query processing. For experimental evaluation, we use a search engine platform that we developed as part of this dissertation research, a large document collection crawled from the Web, and query logs collected by commercial search engines.
更多
查看译文
关键词
query pruning algorithm,single query,optimized query processing,Web search engine,efficient query processing,query throughput,query pruning,query log,index compression,large web search engine,commercial search engine,query processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要