IRLbot: scaling to 6 billion pages and beyond

ACM Transactions on the Web (TWEB), , ArticleNo.8, 2008.

Cited by: 140|Bibtex|Views79|Links
EI
Keywords:
billion connection requestweb grapharticle sharesingle-server implementationlarge-scaleMore(11+)

Abstract:

Abstract—This paper,shares,our,experience,in designing,a web,crawler that can download,billions of pages using a single- server implementation,and,models,its performance.,We,show that with,the quadratically,increasing,complexity,of verifying URL uniqueness, BFS crawl order, and fixed per-host rate- limiting, current crawling algorithms ca...More

Code:

Data:

Your rating :
0

 

Best Paper
Best Paper of WWW, 2008
Tags
Comments