Matchtigs: minimum plain text representation of k -mer sets

Genome biology(2023)

引用 2|浏览0
暂无评分
摘要
We propose a polynomial algorithm computing a minimum plain-text representation of k -mer sets, as well as an efficient near-minimum greedy heuristic. When compressing read sets of large model organisms or bacterial pangenomes, with only a minor runtime increase, we shrink the representation by up to 59% over unitigs and 26% over previous work. Additionally, the number of strings is decreased by up to 97% over unitigs and 90% over previous work. Finally, a small representation has advantages in downstream applications, as it speeds up SSHash-Lite queries by up to 4.26× over unitigs and 2.10× over previous work.
更多
查看译文
关键词
k-mer sets,Plain text compression,Graph algorithm,Sequence analysis,Genomic sequences,Minimum-cost flow,Chinese postman problem
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要