Interconnect synthesis of heterogeneous accelerators in a shared memory architecture

2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)(2015)

引用 6|浏览43
暂无评分
摘要
An accelerator-rich architecture (ARA) is composed of heterogeneous accelerators with an on-chip memory system. Compared to the general-purpose processors, an accelerator demands short and predictable latency to its local on-chip memory to satisfy its performance target. Moreover, an accelerator requires a much higher off-chip memory bandwidth than a CPU since it consumes much more data in a given time period. Therefore, a customized on-chip memory system design is one of the keys to an efficient ARA. In this work we provide a two-layer interconnect synthesis method. We first provide an optimal layer of partial crossbar that connects the heterogeneous accelerators and shared memory banks with a minimum number of switches. The second layer of interconnect tries to interleave possible conflicting long-burst memory requests for prefetching data from off-chip memory. The experimental results show that we can reduce more than 45% of the switches of the partial crossbar compared to the best known method. This further leads to 53% reduction of LUTs and 34% reduction of slice utilization on a 30-accelerator FPGA prototype. Furthermore, the performance of an ARA can be improved by 36% - 52% with a well-designed interleaved network in a real ARA prototype for medical imaging applications. This prototype also shows a 7.44x energy efficiency gain over the state-of-the-art Xeon processors.
更多
查看译文
关键词
Interconnect synthesis,partial crossbar,heterogeneous accelerators,shared memory architecture
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要