ReadsClean: a new approach to error correction of sequencing reads based on alignments clustering

Fokin Oleg, Bakulina Anastasia, Seledtsov Igor,Solovyev Victor

arxiv(2019)

引用 0|浏览0
暂无评分
摘要
Motivation: Next generation methods of DNA sequencing produce relatively high rate of reading errors, which interfere with de novo genome assembly of newly sequenced organisms and particularly affect the quality of SNP detection important for diagnostics of many hereditary diseases. There exists a number of programs developed for correcting errors in NGS reads. Such programs utilize various approaches and are optimized for different specific tasks, but all of them are far from being able to correct all errors, especially in sequencing reads that crossing by repeats and DNA from di/polyploid eukaryotic genomes. Results: This paper describes a novel method of error correction based on clustering of alignments of similar reads. This method is implemented in ReadsClean program, which is designed for cleaning Illumina HiSeq sequencing reads. We compared ReadsClean to other reads cleaning programs recognized to be the best by several publications. Our sequence assembly tests using actual and simulated sequencing reads show superior results achieved by ReadsClean. Availability and implementation: ReadsClean is implemented as a standalone C code. It is incorporated in an error correction pipeline and is freely available to academic users at Softberry web server www.softberry.com.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要