Optimize document identifier assignment for inverted index compression

Journal of Computational Information Systems(2010)

引用 0|浏览21
暂无评分
摘要
Document identifier assignment is a technique for inverted file index compression, by reducing d-gap value of posting lists. It was approached by either TSP or clustering methods in existing study. However, there is no proper formulation for this problem and the existing approaches has no theory guarantee to be good approximations. In this paper, we first formulate document identifier assignment problem as an optimization problem, and then propose a new method to solve it approximately. Our method first clusters the documents by URL information and then rearranges the documents and clusters with benefit function, which is derived by minimizing posting space directly. TSP method can be considered as one simple case of our method. The experiments show that it achieves a good trade-off between efficiency and effectiveness. © 2010 Binary Information Press.
更多
查看译文
关键词
Cluster,Document identifier,Inverted index compression,Optimization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要