Random sampling from a search engine's index
J. ACM, ArticleNo.24, 2008.
well-recorded biasHastings algorithmapproximate Monte Carlo methodbias weightMonte Carlo methodMore(10+)
We revisit a problem introduced by Bharat and Broder almost a decade ago: How to sample random pages from the corpus of documents indexed by a search engine, using only the search engine's public interface? Such a primitive is particularly useful in creating objective benchmarks for search engines. The technique of Bharat and Broder...More
Full Text (Upload PDF)