Random Sampling from a Search Engine's Corpus

    Cited by: 5|Bibtex|Views4|

    Abstract:

    We revisit a problem introduced by Bharat and Broder almost a decade ago: how to sample random pages from the corpus of documents indexed by a search engine, using only the search engine's public interface? Such a primitive is particularly useful in creating objective benchmarks for search engines. The technique of Bharat and Broder suffe...More

    Code:

    Data:

    Your rating :
    0

     

    Tags
    Comments