Capturing collection size for distributed non-cooperative retrieval.
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval(2006)
摘要
Modern distributed information retrieval techniques require accurate knowledge of collection size. In non-cooperative environments, where detailed collection statistics are not available, the size of the underlying collections must be estimated. While several approaches for the estimation of collection size have been proposed, their accuracy has not been thoroughly evaluated. An empirical analysis of past estimation approaches across a variety of collections demonstrates that their prediction accuracy is low. Motivated by ecological techniques for the estimation of animal populations, we propose two new approaches for the estimation of collection size. We show that our approaches are significantly more accurate that previous methods, and are more efficient in use of resources required to perform the estimation.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络