The Cost of Flexibility: Embedded versus Discrete Routers in CGRAs for HPC

2022 IEEE International Conference on Cluster Computing (CLUSTER)(2022)

引用 4|浏览12
暂无评分
摘要
Coarse-Grained Reconfigurable Arrays (CGRAs) are a class of reconfigurable architectures that inherit the performance and usability properties of Central Processing Units (CPUs) and the reconfigurability aspects of Field-Programmable Gate Arrays (FPGAs). Historically, CGRAs have been successfully used to accelerate embedded applications and are today also being considered to accelerate High-Performance Computing (HPC) applications in future supercomputers. However, embedded systems and supercomputers are two vastly different domains with different applications and constraints, and it is today not fully understood what CGRA design decisions adequately cater to the HPC market. One such unknown design decision is regarding the interconnect that facilitates intra-CGRA communication. Today, intra-CGRA communication comes in two flavors: using routers closely embedded into the compute units or using discrete routers outside the compute units. The former trades flexibility for a reduction in hardware cost, while the latter has greater flexibility but is more resource hungry. In this paper, we aspire to understand which of both designs best suits the CGRA HPC segment. We extend our previous methodology, which consists of both a parameterized CGRA design and an OpenMPcapable compiler, to accommodate both types of routing designs, including verification tests using RTL simulation. Our results show that the discrete router design can facilitate better use of processing elements (PEs) compared to embedded routers and can achieve up to 79.27% reduction in unnecessary PE occupancy for an aggressively unrolled stencil kernel on a 18 × 16 CGRA at a (estimated) hardware resource overhead cost of 6.3x. This reduction in PE occupancy can be used, for example, to exploit instruction-level parallelism (ILP) through even more aggressive unrolling.
更多
查看译文
关键词
CGRA,Routing architecture,Design space exploration,HPC,RTL simulation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要