BOA: A Partitioned View of Genome Assembly

biorxiv(2022)

引用 0|浏览16
暂无评分
摘要
De novo genome assembly is a fundamental problem in computational molecular biology that aims to reconstruct an unknown genome sequence from a set of short DNA sequences (or reads) obtained from the genome. High throughput sequencers could generate several billions of such short reads in a single run. However, the relative ordering of the reads along the target genome is not known a priori. This lack of information is one of the main contributors to the increased complexity of the assembly process. Typically, state-of-the-art approaches produce an ordering of the reads toward the end of the assembly process, making it rather too late to benefit from the ordering information. In this paper, with the dual objective of improving assembly quality as well as exposing a high degree of parallelism for assemblers, we present a partitioning-based approach. Our framework - which we call BOA (for bucket-order-assemble)-uses a bucketing alongside graph- and hypergraph-based partitioning techniques to produce a partial ordering of the reads. This partial ordering enables us to divide the read set into disjoint blocks that can be independently assembled in parallel using any state-of-the-art serial assembler of choice. We tested the BOA framework on a variety of genomes. Experimental results show that the hypergraph variant of our approach, Hyper-BOA, consistently improves both the overall assembly quality and performance. For the inputs tested, the Hyper-BOA framework consistently improves the N50 values of the popular standalone MEGAHIT assembler by an average of 1.70x and up to 2.13x; while the largest alignment length improves 1.47x on average and up to 1.94x. The time to solution also consistently improves between 3-4x for the system sizes tested. ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
关键词
Genomics,Bioinformatics,High-performance computing in bioinformatics,Algorithms
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要