Efficiently encoding term co-occurrences in inverted indexes.

CIKM '11: International Conference on Information and Knowledge Management Glasgow Scotland, UK October, 2011(2011)

引用 10|浏览80
暂无评分
摘要
Precomputation of common term co-occurrences has been successfully applied to improve query performance in large scale search engines based on inverted indexes. The results of such precomputations are traditionally stored as additional posting lists in the index. During query evaluation, these precomputed lists are used to reduce the number of query terms, as the results for multiple terms can be accessed through a single precomputed list. In this paper, we expand this paradigm by considering an alternative method for storing term co-occurrences in inverted indexes. For a selected set of terms in the index, we store bitmaps that encode term co-occurrences. A bitmap of size k for term t augments each posting to store the co-occurrences of t with k other terms, across every document in the index. At query evaluation, size k bitmaps can be used to answer queries that involve any of the 2^k combinations of the additional terms. In contrast, a precomputed list, although typically shorter, can only be used to evaluate queries containing all of its terms. We evaluate the bitmaps technique we propose, and the baseline of adding precomputed posting lists and show that they are complementary, as they capture different aspects of the query evaluation cost. We perform an experimental evaluation on the TREC WT10g corpus and show that a hybrid strategy combining both methods significantly lowers the cost of query evaluation compared to each method separately.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要