Hipmer: An Extreme-Scale De Novo Genome Assembler

SC(2015)

引用 92|浏览129
暂无评分
摘要
De novo whole genome assembly reconstructs genomic sequences from short, overlapping, and potentially erroneous DNA segments and is one of the most important computations in modern genomics. This work presents HipMer, the first high-quality end-to-end de novo assembler designed for extreme scale analysis, via efficient parallelization of the Meraculous code. First, we significantly improve scalability of parallel k-mer analysis for complex repetitive genomes that exhibit skewed frequency distributions. Next, we optimize the traversal of the de Bruijn graph of k-mers by employing a novel communication-avoiding parallel algorithm in a variety of use-case scenarios. Finally, we parallelize the Meraculous scaffolding modules by leveraging the one-sided communication capabilities of the Unified Parallel C while effectively mitigating load imbalance. Large-scale results on a Cray XC30 using grand-challenge genomes demonstrate efficient performance and scalability on thousands of cores. Overall, our pipeline accelerates Meraculous performance by orders of magnitude, enabling the complete assembly of the human genome in just 8.4 minutes on 15K cores of the Cray XC30, and creating unprecedented capability for extremescale genomic analysis.
更多
查看译文
关键词
HipMer,de novo genome assembler,genome assembly,Meraculous code parallelization,parallel k-mer analysis,complex repetitive genomes,skewed frequency distribution,de Bruijn graph,communication-avoiding parallel algorithm,use-case scenario,Meraculous scaffolding module,Unified Parallel C,extreme-scale genomic analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要