Towards an Open and Highly Distributed Web Information Retrieval Architecture

semanticscholar

引用 0|浏览0
暂无评分
摘要
Due to the large size of the Web, users require specialized tools to navigate through the vast volumes of data, and a number of search engines and other IR tools have been built to fill this need. The major engines are typically based on scalable clusters, i.e., large numbers of low-cost servers at a single location. Recent events have seen a concentration in this market towards a small number of major players that offer their own proprietary ranking and user tools. For various reasons, these engines do not provide open interfaces to the lower layers of their infrastructure, but offer a service that combines these layers with ranking and user interface. We are proposing a two-tier model of web IR architectures that separates the lower layers of data acquisition, index construction, and index lookups from the higher layers of ranking and user interfaces. Under this model, we investigate the problem of implementing the lower tier in a peer-to-peer environment, with an open and agnostic interface that allows a variety of search and navigation tools and interfaces, located at clients or intermediaries, to be built on top. Our approach is motivated by expected increases in client computing power and bandwidth. In particular, we see the possibility for a rich variety of novel search and navigational tools and interfaces that exploit client computing power and bandwidth and that rely on access to a powerful lower-level search infrastructure. These tools may perform a large number of search engine accesses during a single user interaction, and present the results in a highly optimized and aggregated form. We speculate that the expected increase in query load could be handled by a highly distributed scalable architecture at the lower tier that offers an open interface to the upper tier. We admit that this vision faces significant technical and economic hurdles, and we are not at all certain it will come to pass. However, we argue that the idea of an open, agnostic, and scalable infrastructure is interesting enough to merit detailed study, as it is also relevant in more limited scenarios such as search in intranet environments or P2P communities, and thus we have started designing and implementing a prototype at Polytechnic. In this position paper, we discuss pros and cons of this approach, outline the basic technical challenges, and describe a prototype we are building. Contact author. Email: suel@poly.edu
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要