Automatic Creation of High-Bandwidth Memory Architectures from Domain-Specific Languages: The Case of Computational Fluid Dynamics

ArXiv(2022)

引用 0|浏览1
暂无评分
摘要
Numerical simulations are increasingly used for solving complex problems. Most of these algorithms are massively parallel and can benefit from the spatial parallelism offered by reconfigurable logic. Modern FPGA devices can benefit from high-bandwidth memory technologies, but most of these applications are memory-bound and require designers to craft advanced communication and memory architectures for efficient data movement and on-chip storage. This development process requires hardware design skills that are uncommon in domain-specific experts. In this paper, we propose an automated tool flow from a domain-specific language (DSL) to generate massively-parallel accelerators on FPGA to address these challenges. We use the case of computational fluid dynamics (CFD) as a paradigmatic example. Our flow starts from the high-level specification of tensor operations and combines an MLIR-based compiler with an in-house hardware generation flow to automatically design systems. These systems integrate several parallel accelerators that operate on independent data and a specialized memory architecture that moves data efficiently, aiming at fully exploiting the available CPU-FPGA bandwidth for data transfers. We simulated applications with millions of elements, achieving up to 100 GFLOPS with one compute unit and custom precision when targeting a Xilinx Alveo U280. Our FPGA implementation is almost 25 × more energy efficient than Intel implementations. We also discuss how to address practical limitations when scaling up the parallelism with multiple computing units on the same FPGA board.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要