IRLbot: scaling to 6 billion pages and beyond

    ACM Transactions on the Web (TWEB), , ArticleNo.8, 2008.

    Cited by: 140|Bibtex|Views19|Links
    EI
    Keywords:
    billion connection requestweb grapharticle sharesingle-server implementationlarge-scaleMore(11+)

    Abstract:

    Abstract—This paper,shares,our,experience,in designing,a web,crawler that can download,billions of pages using a single- server implementation,and,models,its performance.,We,show that with,the quadratically,increasing,complexity,of verifying URL uniqueness, BFS crawl order, and fixed per-host rate- limiting, current crawling algorithms ca...More

    Code:

    Data:

    Your rating :
    0

     

    Best Paper
    Best Paper of WWW, 2008
    Tags
    Comments