Hardware Acceleration of Long Read Pairwise Overlapping in Genome Sequencing: A Race Between FPGA and GPU

2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)(2019)

引用 75|浏览251
暂无评分
摘要
In genome sequencing, it is a crucial but time-consuming task to detect potential overlaps between any pair of the input reads, especially those that are ultra-long. The state-of-the-art overlapping tool Minimap2 outperforms other popular tools in speed and accuracy. It has a single computing hot-spot, chaining, that takes 70% of the time and needs to be accelerated. There are several crucial issues for hardware acceleration because of the nature of chaining. First, the original computation pattern is poorly parallelizable and a direct implementation will result in low utilization of parallel processing units. We propose a method to reorder the operation sequence that transforms the algorithm into a hardware-friendly form. Second, the large but variable sizes of input data make it hard to leverage task-level parallelism. Therefore, we customize a fine-grained task dispatching scheme which could keep parallel PEs busy while satisfying the on-chip memory restriction. Based on these optimizations, we map the algorithm to a fully pipelined streaming architecture on FPGA using HLS, which achieves significant performance improvement. The principles of our acceleration design apply to both FPGA and GPU. Compared to the multi-threading CPU baseline, our GPU accelerator achieves 7x acceleration, while our FPGA accelerator achieves 28x acceleration. We further conduct an architecture study to quantitatively analyze the architectural reason for the performance difference. The summarized insights could serve as a guide on choosing the proper hardware acceleration platform.
更多
查看译文
关键词
Field programmable gate arrays,Graphics processing units,Task analysis,Acceleration,Tools,Heuristic algorithms,Genomics
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要