Cpu-Fpga Co-Optimization For Big Data Applications: A Case Study Of In-Memory Samtool Sorting

FPGA '17: The 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Monterey California USA February, 2017(2017)

引用 5|浏览123
暂无评分
摘要
To efficiently process a tremendous amount of data, today's big data applications tend to distribute the datasets into multiple partitions, such that each partition can be fit into memory and be processed by a separate core/server in parallel. Meanwhile, due to the limited scaling of general-purpose CPUs, FPGAs have emerged as an attractive alternative to accelerate big data applications due to their low power, high performance and energy efficiency. In this paper we aim to answer one key question: How should the multicore CPU and FPGA coordinate together to optimize the performance of big data applications? To address the above question, we conduct a step-by-step case study to perform CPU and FPGA co-optimization for in-memory Samtool sorting in genomic data processing, which is one of the most important big data applications for personalized healthcare. First, to accelerate the time-consuming compression algorithm and its associated cyclic redundancy check (CRC) in Samtool sorting, we implement a portable and maintainable FPGA accelerator using high-level synthesis (HLS). Although FPGAs are traditionally well-known to be suitable for compression and CRC, we find that a straightforward integration of this FPGA accelerator into the multi-threaded Samtool sorting only achieves marginal system throughput improvement over the software baseline running on a 12-core CPU. To improve system performance, we propose a dataflow execution model to effectively orchestrate the computation between the multi-threaded CPU and FPGA. Experimental results show that our co-optimized CPU-FPGA system achieves a 2.6x speedup for in-memory Samtool sorting.
更多
查看译文
关键词
Compression and CRC,Genome data sorting,Dataflow execution
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要