Fusion : Design Tradeoffs In Coherent Cache Hierarchies For Accelerators

Snehasis Kumar,Arrvindh Shriraman,Naveen Vedula

2015 ACM/IEEE 42ND ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA)（2015）

引用 21|浏览0

暂无评分

摘要

Chip designers have shown increasing interest in integrating specialized fixed-fUnction coprocessors into multi core designs to improve energy efficiency. Recent work in academia [11, 37] and industry ON has sought to enable more fine-grain offloading at the granularity of functions and loops. The sequential program now needs to migrate across the chip utilizing the appropriate accelerator for each program region. As the execution migrates, it has become increasingly challenging to retain the temporal and spatial locality of the original program as well as manage the data sharing.We show that with the increasing energy cost of wires and caches relative to compute operations, it is imperative to optimize data movement to retain the energy benefits of accelerators. We develop FUSION, a lightweight coherent cache hierarchy for accelerators and study the tradeoffs compared to a scratchpad based architecture. We find that coherency, both between the accelerators and with, the CPU, can help minimize data movement and save energy. FUSION leverages temporal coherence 1321 to optimize data movement within the accelerator tile. The accelerator tile includes small per-accelerator LO caches to minimize hit energy and a per-tile shared cache to improve localized-sharing between accelerators and minimize data exchanges with the host LLC. We find that overall EU,SION improves performance by 4.3 x compared to an oracle DMA that pushes data into the scratchpad. In workloads with inter-accelerator sharing we save up to 10x the dynamic energy of the cache hierarchy by minimizing the host-accelerator data ping-ponging.

查看译文

关键词

design tradeoffs,chip designers,fixed-function coprocessors,multicore designs,energy efficiency,fine-grain offloading,functions granularity,loops granularity,sequential program,program region,data sharing,energy cost,wires,caches,data movement,energy benefits,FUSION,lightweight coherent cache hierarchy,scratchpad based architecture,CPU,energy saving,temporal coherence,accelerator tile,hit energy,per-tile shared cache,localized-sharing,interaccelerator sharing,dynamic energy,host-accelerator data ping-ponging

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要