Measuring the clustering effect of BWT via RLE.

Theoretical Computer Science(2017)

引用 20|浏览19
暂无评分
摘要
The Burrows–Wheeler Transform (BWT) is a reversible transformation on which are based several text compressors and many other tools used in Bioinformatics and Computational Biology. The BWT is not actually a compressor, but a transformation that performs a context-dependent permutation of the letters of the input text that often create runs of equal letters (clusters) longer than the ones in the original text, usually referred to as the “clustering effect” of BWT. In particular, from a combinatorial point of view, great attention has been given to the case in which the BWT produces the fewest number of clusters (cf. [5], [16], [21], [23]). In this paper we are concerned about the cases when the clustering effect of the BWT is not achieved. For this purpose we introduce a complexity measure that counts the number of equal-letter runs of a word. This measure highlights that there exist many words for which BWT gives an “un-clustering effect”, that is BWT produce a great number of short clusters.
更多
查看译文
关键词
BWT,Permutation,Run-length encoding
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要