Federated text retrieval from uncooperative overlapped collections.

IR(2007)

引用 45|浏览18
暂无评分
摘要
ABSTRACTIn federated text retrieval systems, the query is sent to multiple collections at the same time. The results returned by collections are gathered and ranked by a central broker that presents them to the user. It is usually assumed that the collections have little overlap. However, in practice collections may share many common documents as either exact or near duplicates, potentially leading to high numbers of duplicates in the final results. Considering the natural band width restrictions and efficiency issues of federated search, sendingqueries to redundant collections leads to unnecessary costs. We propose a novel method for estimating the rate of over-lap among collections based on sampling. Then, using theestimated overlap statistics, we propose two collection selection methods that aim to maximize the number of unique relevant documents in the final results. We show experimentally that, although our estimates of overlap are not in exact, our suggested techniques can significantly improve the search effectiveness when collections overlap.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要