Comprehensive and realistic simulation of tumour genomic sequencing data

NAR CANCER(2023)

引用 0|浏览1
暂无评分
摘要
Accurate identification of somatic mutations and allele frequencies in cancer has critical research and clinical applications. Several computational tools have been developed for this purpose but, in the absence of comprehensive 'ground truth' data, assessing the accuracy of these methods is challenging. We created a computational framework to simulate tumour and matched normal sequencing data for which the source of all loci that contain non-reference bases is known, based on a phased, personalized genome. Unlike existing methods, we account for sampling errors inherent in the sequencing process. Using this framework, we assess accuracy and biases in inferred mutations and their frequencies in an established somatic mutation calling pipeline. We demonstrate bias in existing methods of mutant allele frequency estimation and show, for the first time, the observed mutation frequency spectrum corresponding to a theoretical model of tumour evolution. We highlight the impact of quality filters on detection sensitivity of clinically actionable variants and provide definitive assessment of false positive and false negative mutation calls. Our simulation framework provides an improved means to assess the accuracy of somatic mutation calling pipelines and a detailed picture of the effects of technical parameters and experimental factors on somatic mutation calling in cancer samples. Graphical Abstract
更多
查看译文
关键词
realistic simulation,tumour
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要