Implementation and Performance Evaluation of Collective Communications Using CIRCUS on Multiple FPGAs.

HPC Asia Workshops(2023)

引用 1|浏览15
In the high-performance computing domain, Field Programmable Gate Array (FPGA) is a novel accelerator that exhibits high flexibility and performance characteristics distinct from other accelerators such as the Graphics Processing Unit (GPU). Recent advanced high-end FPGA is equipped with multiple channels of high speed optical link up to 100Gbps performance for each. This is a crucial feature when we construct PC clusters with FPGAs as accelerators, however it is not easy to utilize from user kernels because this feature is implemented in low level and simple direct communication between neighboring FPGAs. In order to provide the communication feature between FPGAs for accelerated PC clusters, we developed a communication system named CIRCUS which implies a user-friendly API from OpenCL and is equipped with routing function over multi-hop communication on multi-dimensional torus network of FPGAs. However, current CIRCUS only provides a point-to-point communication between source and destination FPGAs. In ordinary parallel processing environment such as MPI, the user program the message passing with various collective communication functions for parallel algorithm, for instance Allreduce, Allgather, etc. In this paper, we implement the collective communication function over CIRCUS for user-friendly programming of ordinary parallel algorithms on FPGAs. As the first target, we implement Allreduce function which is the most essential and important function. The paper describes the CIRCUS system briefly followed by the design, implementation and preliminary performance evaluation on Intel Stratix10 FPGAs.
AI 理解论文
Chat Paper