SurVIndel2: improving local CNVs calling from next-generation sequencing using novel hidden information

bioRxiv (Cold Spring Harbor Laboratory)(2023)

引用 0|浏览1
暂无评分
摘要
Deletions and tandem duplications (commonly called CNVs) represent the majority of structural variations. They can be identified using short reads, but because they frequently occur in repetitive regions, existing methods fail to detect most of them. This is because CNVs in repetitive regions often do not produce the evidence needed by existing short reads- based callers (split reads, discordant pairs or read depth change). Here, we introduce a new CNV short reads-based caller named SurVIndel2. SurVindel2 builds on statistical techniques we previously developed, but also employs a novel type of evidence, hidden split reads, that can uncover many CNVs missed by existing algorithms. We use public benchmarks to show that SurVIndel2 outperforms other popular callers, both on human and non-human datasets. Then, we demonstrate the practical utility of the method by generating a catalogue of CNVs for the 1000g project that contains hundreds of thousands of CNVs missing from the most recent public catalogue. We also show that SurVIndel2 is able to complement small indels predicted by Google DeepVariant, and the two software used in tandem produce a remarkably complete catalogue of variants in an individual. Finally, we characterise how the limitations of current sequencing technologies contribute significantly to the missing CNVs.
更多
查看译文
关键词
next-generation next-generation sequencing,local cnvs
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要