Engineering Rank/Select Data Structures for Big-Alphabet Strings

Diego Arroyuelo, Gabriel Carmona,Héctor Larrañaga, Francisco Riveros, Erick Sepúlveda

CoRR(2023)

引用 0|浏览6
暂无评分
摘要
Big-alphabet strings are common in several scenarios such as information retrieval and natural-language processing. The efficient storage and processing of such strings usually introduces several challenges that are not witnessed in smaller-alphabets strings. This paper studies the efficient implementation of one of the most effective approaches for dealing with big-alphabet strings, namely the \emph{alphabet-partitioning} approach. The main contribution is a compressed data structure that supports the fundamental operations rank and select efficiently. We show experimental results that indicate that our implementation outperforms the current realizations of the alphabet-partitioning approach. In particular, the time for operation select can be improved by about 80%, using only 11% more space than current alphabet-partitioning schemes. We also show the impact of our data structure on several applications, like the intersection of inverted lists (where improvements of up to 60% are achieved, using only 2% of extra space), the representation of run-length compressed strings, and the distributed-computation processing of rank and select operations.
更多
查看译文
关键词
rank/select data structures,big-alphabet
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要