CASCADE: High Throughput Data Streaming via Decoupled Access-Execute CGRA

ACM Transactions on Embedded Computing Systems (TECS)(2019)

引用 17|浏览12
暂无评分
摘要
A Coarse-Grained Reconfigurable Array (CGRA) is a promising high-performance low-power accelerator for compute-intensive loop kernels. While the mapping of the computations on the CGRA is a well-studied problem, bringing the data into the array at a high throughput remains a challenge. A conventional CGRA design involves on-array computations to generate memory addresses for data access undermining the attainable throughput. A decoupled access-execute architecture, on the other hand, isolates the memory access from the actual computations resulting in a significantly higher throughput. We propose a novel decoupled access-execute CGRA design called CASCADE with full architecture and compiler support for high-throughput data streaming from an on-chip multi-bank memory. CASCADE offloads the address computations for the multi-bank data memory access to a custom designed programmable hardware. An end-to-end fully-automated compiler synchronizes the conflict-free movement of data between the memory banks and the CGRA. Experimental evaluations show on average 3× performance benefit and 2.2× performance per watt improvement for CASCADE compared to an iso-area conventional CGRA with a bigger processing array in lieu of a dedicated hardware memory address generation logic.
更多
查看译文
关键词
Coarse grained reconfigurable arrays, decoupled access-execute architectures, multi-bank memory partitioning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要