Deep learning for assembly of haplotypes and viral quasispecies from short and long sequencing reads

Bioinformatics, Computational Biology and Biomedicine(2022)

引用 0|浏览3
暂无评分
摘要
BSTRACTInformation about genetic variations in either individual genomes or viral populations provides insight in genetic signatures of diseases and suggests directions for medical and pharmaceutical research. State-of-the-art sequencing platforms generate massive amounts of reads, with length varying from one technology to another, that provide data needed for the reconstruction of haplotypes and viral quasispecies. On the one hand, high-throughput platforms are capable of providing enormous amounts of highly accurate but relatively short reads; inability to bridge long genetic distances renders the reconstruction with such reads challenging. On the other hand, the latest generation of sequencing technologies is capable of generating much longer reads but those reads suffer from sequencing errors at a rate higher than the error rate of short reads. This motivates search for reconstruction methods capable of leveraging both the high accuracy of short reads and the phase resolving power of long reads. We present a deep learning framework that relies on convolutional auto-encoders with a clustering layer to reconstruct individual haplotypes or viral populations from hybrid data sources. First, an auto-encoder for haplotype assembly / viral population reconstruction from short reads is pre-trained separately from another one utilizing long reads for the same task. The pre-trained models are then retrained simultaneously to enable decision fusion. Results on realistic synthetic as well as experimental data demonstrate that the proposed framework outperforms state-of-the-art techniques for haplotype assembly and viral quasispecies reconstruction, and achieves significantly higher accuracy on those tasks than methods utilizing only one type of reads. Code is available at https://github.com/WuLoli/HybSeq.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要