SeqCorrect – A Modular Toolkit for DNA Sequencing Error Correction and Evaluation

semanticscholar(2017)

引用 0|浏览0
暂无评分
摘要
Motivation: While there exists a plethora of sequencing error correction tools, the field is still lacking a generalized modular framework for this task. Especially wetlaband rundependent error profile characteristics are often ignored by current sequencing error correction methods. Many sequence correction tools ignore sequence-specific errors and do not explicitly model the G/C coverage bias of sequencers. Encapsulating this functionality in separate, user-friendly modules will facilitate the development of future sequencing error correction tools. Results: We compute expected k-mer counts under an idealized sequencing model and infer run-dependent median G/C coverage biases by counting k-mers in the read dataset and comparing the observed counts with their expected values. We classify k-mers into untrusted, unique, and repetitive. We correct substitution, insertion, and deletion errors and handle repetitive regions by locally and adaptively increasing the k-mer size in a read. Our error correction approach introduces less new errors (false positives) than other tools, but also corrects less errors in total. The main purpose of our toolkit is to simplify the design and evaluation of future error correction approaches. Availability: SeqCorrect is implemented in C++ and is available for download at https://github.com/lutteropp/SeqCorrect under the GNU GPL-3.0 license. Contact: sarah.lutteropp@h-its.org, alexandros.stamatakis@h-its.org
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要