Gapped Indexing for Consecutive Occurrences

CPM(2022)

引用 0|浏览4
暂无评分
摘要
The classic string indexing problem is to preprocess a string S into a compact data structure that supports efficient pattern matching queries. Typical queries include existential queries (decide if the pattern occurs in S ), reporting queries (return all positions where the pattern occurs), and counting queries (return the number of occurrences of the pattern). In this paper we consider a variant of string indexing, where the goal is to compactly represent the string such that given two patterns P_1 and P_2 and a gap range [α , β ] we can quickly find the consecutive occurrences of P_1 and P_2 with distance in [α , β ] , i.e., pairs of subsequent occurrences with distance within the range. We present data structures that use linear space and query time O(|P_1|+|P_2|+n^2/3) for existence and counting and O(|P_1|+|P_2|+n^2/3occ^1/3) for reporting. We complement this with a conditional lower bound based on the set intersection problem showing that any solution using O(n) space must use Ω(|P_1| + |P_2| + √(n)) query time. To obtain our results we develop new techniques and ideas of independent interest including a new suffix tree decomposition and hardness of a variant of the set intersection problem.
更多
查看译文
关键词
String indexing,Two patterns,Consecutive occurrences,Conditional lower bound
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要