VarBen: Generating in Silico Reference Data Sets for Clinical Next-Generation Sequencing Bioinformatics Pipeline Evaluation

Ziyang Li,Shuangsang Fang,Rui Zhang,Lijia Yu,Jiawei Zhang,Dechao Bu,Liang Sun,Yi Zhao,Jinming Li

The Journal of Molecular Diagnostics（2021）

引用 7|浏览116

暂无评分

摘要

Next-generation sequencing is increasingly being adopted as a valuable method for the detection of somatic variants in clinical oncology. However, it is still challenging to reach a satisfactory level of robustness and standardization in clinical practice when using the currently available bioinformatics pipelines to detect variants from raw sequencing data. Moreover, appropriate reference data sets are lacking for clinical bioinformatics pipeline development, validation, and proficiency testing. Herein, we developed the Variant Benchmark tool (VarBen), an open-source software for variant simulation to generate customized reference data sets by directly editing the original sequencing reads. VarBen can introduce a variety of variants, including single-nucleotide variants, small insertions and deletions, and large structural variants, into targeted, exome, or whole-genome sequencing data, and can handle sequencing data from both the Illumina and Ion Torrent sequencing platforms. To demonstrate the feasibility and robustness of VarBen, we performed variant simulation on different sequencing data sets and compared the simulated variants with real-world data. The validation study showed that the simulated data are highly comparable to real-world data and that VarBen is a reliable tool for variant simulation. In addition, our collaborative study of somatic variant calling in 20 laboratories emphasizes the need for laboratories to evaluate their bioinformatics pipelines with customized reference data sets. VarBen may help users develop and validate their bioinformatics pipelines using locally generated sequencing data.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要