Efficient Search Engine Measurements
TWEB, ArticleNo.18, 2011.
corpus sizecostly rejectionapproximate importancedocument degreenew importanceMore(11+)
We address the problem of externally measuring aggregate functions over documents indexed by search engines, like corpus size, index freshness, and density of duplicates in the corpus. State of the art estimators for such quantities [Bar-Yossef and Gurevich 2008b; Broder et al. 2006] are biased due to inaccurate approximation of the so ca...More
PPT (Upload PPT)