Darwin: A Hardware-acceleration Framework for Genomic Sequence Alignment

bioRxiv(2017)

引用 13|浏览42
暂无评分
摘要
Genomics is set to transform medicine and our understanding of life in fundamental ways. But the growth in genomics data has been overwhelming - far outpacing Moore9s Law. The advent of third generation sequencing technologies is providing new insights into genomic contribution to diseases with complex mutation events, but have prohibitively high computational costs. Over 1,300 CPU hours are required to align reads from a 54X coverage of the human genome to a reference, and over 15,600 CPU hours to assemble the reads de novo. This paper proposes - a hardware-accelerated framework for genomic sequence alignment that, without sacrificing sensitivity, provides 125X and 15.6X speedup over the state-of-the-art software counterparts for reference-guided and de novo assembly of third generation sequencing reads, respectively. For pairwise alignment of sequences, Darwin is over 39,000X more energy-efficient than software. Darwin uses (i) a novel filtration strategy, called m-SIFT, to reduce the search space for sequence alignment at high speed, and (ii) a hardware-accelerated version of GACT, a novel algorithm to generate near-optimal alignments of arbitrarily long genomic sequences using constant memory for trace-back. Darwin9s framework is general-purpose, with tunable speed and sensitivity to match the requirements of genomic application even beyond sequencing.
更多
查看译文
关键词
Genomic sequence alignment,Long reads,Hardware acceleration,Resequencing,<italic>De novo</italic> assembly
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要