Haplotype-phasing of long-read HiFi data to enhance structural variant detection through a Skip-Gram model

Can Luo, Parth A. Datar,Yichen Henry Liu, Xin Zhou

2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)(2022)

引用 0|浏览2
暂无评分
摘要
Haplotype-resolved assembly is the optimal approach to characterize structural variants (SVs) in the human genome. Current long-read technologies, such as high-fidelity (HiFi) reads provide both low sequencing error and long-range information which enable phased, diploid assembly. Here we introduce embSV, a haplotype-phasing-assisted diploid assembly-based SV detection tool, which takes advantage of a word-embedding model inspired by natural language processing to faithfully partition HiFi reads for local haplotype-specific assembly. The genome-wide haplotype-resolved assembly then allows us to comprehensively detect SVs. To thoroughly test the performance and robustness of embSV and compare it with existing SV callers, we have run several benchmarking experiments. Tuning different parameters that can affect breakpoint shift and alternate sequence similarity between called SVs and the gold standard reveals that embSV outperforms existing tools and is the most robust tool to parameter change.
更多
查看译文
关键词
structural variant,phasing,diploid assembly,NLP,long reads,PacBio
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要