Scalable De Novo Genome Assembly Using Pregel
2018 IEEE 34th International Conference on Data Engineering (ICDE)(2018)
摘要
De novo genome assembly is the process of stitching short DNA sequences to generate longer DNA sequences, without using any reference sequence for alignment. It enables high-throughput genome sequencing and thus accelerates the discovery of new genomes. In this paper, we present a toolkit, called PPA-assembler, for de novo genome assembly in a distributed setting. The operations in our toolkit provide strong performance guarantees, and can be assembled to implement various sequencing strategies. PPA-assembler adopts the popular {\em de Bruijn graph} based approach for sequencing, and each operation is implemented as a program in Google's Pregel framework for big graph processing. Experiments on large real and simulated datasets demonstrate that PPA-assembler is much more efficient than the state-of-the-arts and provides good sequencing quality.
更多查看译文
关键词
genome assembly,Pregel,de Bruijn graph,DNA sequence
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要