Performance improvement by enhancing spatial parallelism on FPGA for HPC applications

2023 IEEE International Conference on Cluster Computing Workshops (CLUSTER Workshops)(2023)

引用 0|浏览17
In today’s HPC systems, GPUs with high computational performance and memory bandwidth under relatively low power consumption are the leading players. However, GPU-based acceleration is designed to excel when utilizing many computation cores and performing SIMD/STMD manner of synchronized computation over a large number of uniform data array elements. Therefore, it may not fully exploit its computational performance in calculations with low parallelism, complex operations involving conditional branching, or parallel applications with frequent inter-node communication to interrupt continuous computing on GPU devices. One of the alternative solutions for accelerated computing is FPGA (Field Programmable Gate Array), especially with recent advancements in devices containing a large number of logic elements, high memory bandwidth, and even multiple channels of high-speed optical interconnection interfaces, reaching up to 100 Gbps for each. The performance of an FPGA is based on pipeline parallelism, enabling the computation stream to continue even with conditional branches.
FPGA,OpenACC,OpenCL,compiler,HPC,spatial parallelism,memory utilization
AI 理解论文
Chat Paper