RabbitQCPlus 2.0: More efficient and versatile quality control for sequencing data

Methods(2023)

引用 0|浏览10
暂无评分
摘要
Assessing the quality of sequencing data plays a crucial role in downstream data analysis. However, existing tools often achieve sub-optimal efficiency, especially when dealing with compressed files or performing complicated quality control operations such as over-representation analysis and error correction. We present RabbitQCPlus, an ultra-efficient quality control tool for modern multi-core systems. RabbitQCPlus uses vectorization, memory copy reduction, parallel (de)compression, and optimized data structures to achieve substantial performance gains. It is 1.1 to 5.4 times faster when performing basic quality control operations compared to state-of-the-art applications yet requires fewer compute resources. Moreover, RabbitQCPlus is at least 4 times faster than other applications when processing gzip-compressed FASTQ files and 1.3 times faster with the error correction module turned on. Furthermore, it takes less than 4 minutes to process 280 GB of plain FASTQ sequencing data, while other applications take at least 22 minutes on a 48-core server when enabling the per-read over-representation analysis. C++ sources are available at https://github.com/RabbitBio/RabbitQCPlus.
更多
查看译文
关键词
Sequencing data, Quality control, Gzip-compressed, Over-representation, Error correction, Vectorization, HPC
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要