On the feasibility of geographically distributed web crawling.

Proceedings of the 3rd international conference on Scalable information systems(2008)

引用 17|浏览0
暂无评分
摘要
We identify the issues that are important in design of a geographically distributed Web crawler. The identified issues are discussed from a "benefit" and "challenge" point of view. More specifically, we focus on the effect of geographical locality of Web sites on crawling performance, and, as a practical study, investigate the feasibility of a distributed crawler in terms of network costs. For this purpose, we conduct various experiments to collect network access statistics about the servers in the educational domains of eight different countries (USA, Canada, Chile, Brazil, Spain, Portugal, Turkey, and Greece). We gather the statistics from four different sites located in USA, Brazil, Spain, and Turkey using echoping. The results favor geographically distributed Web crawling in terms of crawling throughput.
更多
查看译文
关键词
Web crawler,Web crawling,Web site,crawling performance,crawling throughput,different country,different site,network access statistic,network cost,educational domain,web crawling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要