LZ77 Like Lossy Transformation of Quality Scores

2018 Data Compression Conference(2018)

引用 1|浏览12
暂无评分
摘要
The current development in the Next-Generation Sequencing (NGS) technologies and the gradual growth of its use leads to the production of a huge amount of sequencing data. There is a need to efficiently transfer and store these data. This article introduces a novel lossy transformation algorithm for quality scores in sequencing data. Asymptotically, the algorithm preserves the likelihood of occurrence of particular quality score in individual positions of quality sequences. Such a model may be advantageous for sequencing data with very high coverage, such as targeted amplicon sequencing data. In experimental results, we show the comparison of characteristics of this algorithm with other algorithms performing lossy compression of the quality sequences. The proposed algorithm can be easily integrated into current sequencing pipelines. In this work we apply the algorithm to SAM files, which are then compressed into BAM files. The goal of the algorithm is to modify the data so that the subsequent Deflate algorithm application achieves a better compression ratio while minimizing the negative effects in the subsequent variant calling.
更多
查看译文
关键词
LZ77,LLZT,Lossy Compression,Quality Scores,BAM
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要