Fast Label Extraction in the CDAWG.

Lecture Notes in Computer Science(2017)

引用 14|浏览24
暂无评分
摘要
The compact directed acyclic word graph (CDAWG) of a string T of length n takes space proportional just to the number e of right extensions of the maximal repeats of T, and it is thus an appealing index for highly repetitive datasets, like collections of genomes from similar species, in which e grows significantly more slowly than n. We reduce from O(mlog log n) to O(m) the time needed to count the number of occurrences of a pattern of length m, using an existing data structure that takes an amount of space proportional to the size of the CDAWG. This implies a reduction from O(mlog log n+occ) to O(m+occ) in the time needed to locate all the occ occurrences of the pattern. We also reduce from O(k log log n) to O(k) the time needed to read the k characters of the label of an edge of the suffix tree of T, and we reduce from O(mlog log n) to O(m) the time needed to compute the matching statistics between a query of length m and T, using an existing representation of the suffix tree based on the CDAWG. All such improvements derive from extracting the label of a vertex or of an arc of the CDAWG using a straight-line program induced by the reversed CDAWG.
更多
查看译文
关键词
CDAWG,Suffix tree,Maximal repeat,Straight-line program,Count query,Locate query,Matching statistics,Minimal absent words
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要