IRLbot: scaling to 6 billion pages and beyond
ACM Transactions on the Web (TWEB), , ArticleNo.8, 2008.
EI
Keywords:
Abstract:
Abstract—This paper,shares,our,experience,in designing,a web,crawler that can download,billions of pages using a single- server implementation,and,models,its performance.,We,show that with,the quadratically,increasing,complexity,of verifying URL uniqueness, BFS crawl order, and fixed per-host rate- limiting, current crawling algorithms ca...More
Code:
Data:
Best Paper
Best Paper of WWW, 2008
Tags
Comments