Symphony: Orchestrating Sparse and Dense Tensors with Hierarchical Heterogeneous Processing

Michael Pellauer,Jason Clemons,Vignesh Balaji,Neal Crago,Aamer Jaleel,Donghyuk Lee,Mike O'Connor,Anghsuman Parashar,Sean Treichler,Po-An Tsai,Stephen W. Keckler,Joel S. Emer

ACM TRANSACTIONS ON COMPUTER SYSTEMS（2023）

引用 0|浏览8

暂无评分

摘要

Sparse tensor algorithms are becoming widespread, particularly in the domains of deep learning, graph and data analytics, and scientific computing. Current high-performance broad-domain architectures, such as GPUs, often suffer memory system inefficiencies by moving too much data or moving it too far through the memory hierarchy. To increase performance and efficiency, proposed domain-specific accelerators tailor their architectures to the data needs of a narrow application domain, but as a result cannot be applied to a wide range of algorithms or applications that contain a mix of sparse and dense algorithms. This article proposes Symphony, a hybrid programmable/specialized architecture that focuses on the orchestration of data throughout the memory hierarchy to simultaneously reduce the movement of unnecessary data and data movement distances. Key elements of the Symphony architecture include (1) specialized reconfigurable units aimed not only at roofline floating-point computations but also at supporting data orchestration features, such as address generation, data filtering, and sparse metadata processing; and (2) distribution of computation resources (both programmable and specialized) throughout the on-chip memory hierarchy. We demonstrate that Symphony can match non-programmable ASIC performance on sparse tensor algebra and provide 31x improved runtime and 44x improved energy over a comparably provisioned GPU for these applications.

查看译文

关键词

Sparse tensor algebra,data orchestration,accelerators

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要