Runtime Row/Column Activation Pruning for ReRAM-based Processing-in-Memory DNN Accelerators

2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)(2023)

引用 0|浏览1
暂无评分
摘要
Resistive random access memory (ReRAM)-based processing-in-memory (PIM) DNN accelerators have shown great potential in improving model efficiency and saving energy. To further improve memory and computation efficiency, model weight sparsity has been widely explored in ReRAM-based accelerator designs. However, these optimized accelerators rarely touched the model activation sparsity. In this paper, we observe that there exist plenty sparse rows/columns in the DNN model activation matrix which have negligible effect on accuracy, termed as insensitive rows/columns. Pruning them has little impact on model accuracy but would have significant potential to improve the performance and energy efficiency of DNN accelerators. Therefore, we propose a new ReRAM-based PIM accelerator, named as RapPIM, to take advantage of the model activation sparsity. In RapPIM, we first propose an insensitive activation rows/columns pruning method to search and prune the insensitive rows/columns. Then, we present an activation low-bits skipping strategy and a forward propagation delay hiding strategy to further improve model performance and minimize the latency of activation pruning on forward propagation. Our evaluations with several well-known DNN models show that the RapPIM achieves up to 2.40x speedup and 44.82% power reduction compared with the state-of-the-art ReRAM-based accelerator.
更多
查看译文
关键词
Processing-in-memory,ReRAM,Activation Pruning,Neural Network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要