nPoRe: n-Polymer Realigner for improved pileup variant calling

biorxiv(2022)

引用 1|浏览1
暂无评分
摘要
Despite recent improvements in nanopore basecalling accuracy, germline variant calling of small insertions and deletions (INDELs) remains poor. Although precision and recall for single nucleotide polymorphisms (SNPs) now regularly exceeds 99.5%, INDEL recall at relatively high coverages (85×) remains below 80% for standard R9.4.1 flow cells [[22][1], [23][2], [31][3]]. Current nanopore variant callers work in two stages: an efficient pileup-based method identifies candidates of interest, and then a more expensive full-alignment model provides the final variant calls. Most false negative INDELs are lost during the first (pileup-based) step, particularly in low-complexity repeated regions. We show that read phasing and realignment can recover a significant portion of INDELs lost during this stage. In particular, we extend Needleman-Wunsch affine gap alignment by introducing new gap penalties for more accurately aligning repeated n -polymer sequences such as homopolymers ( n = 1) and tandem repeats (2 ≤ n ≤ 6). On our dataset with 60.6× coverage, haplotype phasing improves INDEL recall in all evaluated high confidence regions from 63.76% to 70.66% and then nPoRe realignment improves it further to 73.04%, with no loss of precision. ### Competing Interest Statement The authors have declared no competing interest. [1]: #ref-22 [2]: #ref-23 [3]: #ref-31
更多
查看译文
关键词
Germline variant calling, Alignment, N-polymer, Homopolymer, Short tandem repeat, Copy number, Nanopore sequencing, Variable gap penalty
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要