Evaluating the impact of sequencing error correction for RNA-seq data with ERCC RNA spike-in controls.

BHI(2016)

引用 5|浏览20
暂无评分
摘要
Sequencing errors are a major issue for several next-generation sequencing-based applications such as de novo assembly and single nucleotide polymorphism detection. Several error-correction methods have been developed to improve raw data quality. However, error-correction performance is hard to evaluate because of the lack of a ground truth. In this study, we propose a novel approach which using ERCC RNA spike-in controls as the ground truth to facilitate error-correction performance evaluation. After aligning raw and corrected RNA-seq data, we characterized the quality of reads by three metrics: mismatch patterns (i.e., the substitution rate of A to C) of reads aligned with one mismatch, mismatch patterns of reads aligned with two mismatches and the percentage increase of reads aligned to reference. We observed that the mismatch patterns for reads aligned with one mismatch are significantly correlated between ERCC spike-ins and real RNA samples. Based on such observations, we conclude that ERCC spike-ins can serve as ground truths for error correction beyond their previous applications for validation of dynamic range and fold-change response. Also, the mismatch patterns for ERCC reads aligned with one mismatch can serve as a novel and reliable metric to evaluate the performance of error-correction tools.
更多
查看译文
关键词
RNA,biomedical engineering,medical computing,ERCC RNA spike-in control,RNA sample,RNA-seq data,error correction method,nucleotide polymorphism detection,sequencing error correction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要