Efficient indexing and query processing in distributed search engines

Efficient indexing and query processing in distributed search engines(2008)

引用 23|浏览22
暂无评分
摘要
With the rapid growth of the web, more and more people use web search engines as their primary means for locating relevant information. Such search engines are built on large clusters of hundreds or thousands of servers, and employ numerous published and proprietary performance optimizations in order to support thousands of queries per second on billions of web pages. Comparing to the fairy centralized architecture above, we look into the problem of building a highly distributed search engine, where each participating machine is in a separate location and each query involves cooperation between a number of such machines over the internet. In this thesis, we focus on the case of a global index organization in a highly distributed environment, and the main bottleneck in such a scenario is the amount of communication variants required during query evaluation. We propose several query execution polices as well as inverted lists assignment algorithms to decrease the transmission costs among nodes during query processing in distributed search engines. In addition, we study one particular situation and propose a general framework for indexing and query processing of archival collections and, more generally, any collections with a sufficient amount of redundancy. The approach results in significant reductions in index size and query processing costs. Our work would be useful not only for the purpose of trying to compete with the current more centralized engines on web search tasks, but also for probably more realistic challenges such as the efficient indexing and search of textual data residing in distributed systems.
更多
查看译文
关键词
web page,query evaluation,query execution police,web search task,efficient indexing,search engine,web search engine,query processing,centralized engine,query processing cost
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要