Random sampling from a search engine's index

    J. ACM, ArticleNo.24, 2008.

    Cited by: 250|Bibtex|Views21|Links
    EI
    Keywords:
    well-recorded biasHastings algorithmapproximate Monte Carlo methodbias weightMonte Carlo methodMore(10+)

    Abstract:

    We revisit a problem introduced by Bharat and Broder almost a decade ago: How to sample random pages from the corpus of documents indexed by a search engine, using only the search engine's public interface? Such a primitive is particularly useful in creating objective benchmarks for search engines. The technique of Bharat and Broder...More

    Code:

    Data:

    Your rating :
    0

     

    Tags
    Comments