CPSAA: Accelerating Sparse Attention Using Crossbar-Based Processing-In-Memory Architecture

Huize Li,Hai Jin,Long Zheng,Xiaofei Liao,Yu Huang,Cong Liu, Jiahong Xu,Zhuohui Duan,Dan Chen,Chuangyi Gui

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems（2023）

引用 0|浏览2

暂无评分

摘要

The attention-based neural network attracts great interest due to its excellent accuracy enhancement. However, the attention mechanism requires huge computational efforts to process unnecessary calculations, significantly limiting the system’s performance. To reduce the unnecessary calculations, researchers propose sparse attention to convert some dense-dense matrices multiplication (DDMM) operations to sampled dense-dense matrix multiplication (SDDMM) and sparse matrix multiplication (SpMM) operations. However, current sparse attention solutions introduce massive off-chip random memory access since the sparse attention matrix is generally unstructured. We propose CPSAA, a novel crossbar-based processing-in-memory (PIM)-featured sparse attention accelerator to eliminate off-chip data transmissions. First, we present a novel attention calculation mode to balance the crossbar writing and crossbar processing latency. Second, we design a novel PIM-based sparsity pruning architecture to eliminate the pruning phase’s off-chip data transfers. Finally, we present novel crossbar-based SDDMM and SpMM methods to process unstructured sparse attention matrices by coupling two types of crossbar arrays. Experimental results show that CPSAA has an average of 89.6×, 32.2×, 17.8×, 3.39×, and 3.84× performance improvement and 755.6×, 55.3×, 21.3×, 5.7×, and 4.9× energy-saving when compare with GPU, FPGA, SANGER, ReBERT, and ReTransformer.

查看译文

关键词

processing-in-memory,domain-specific accelerator,attention mechanism,ReRAM

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要