AUTO-PRUNE: automated DNN pruning and mapping for ReRAM-based accelerator

International Conference on Supercomputing(2021)

引用 16|浏览25
暂无评分
摘要
ABSTRACTEmergent ReRAM-based accelerators support in-memory computation to accelerate deep neural network (DNN) inference. Weight matrix pruning of DNNs is a widely used technique to reduce the size of DNN models, thereby reducing the resource and energy consumption of ReRAM-based accelerators. However, conventional works on weight matrix pruning for ReRAM-based accelerators have three major issues. First, they use heuristics or rules from domain experts to prune the weights, leading to suboptimal pruning policies. Second, they mostly focus on improving compression ratio, thus may not meet accuracy constraints. Third, they ignore direct feedback of hardware. In this paper, we introduce an automated DNN pruning and mapping framework, named AUTO-PRUNE. It leverages reinforcement learning (RL) to automatically determine the pruning policy considering the constraint of accuracy loss. The reward function of RL agents is designed using hardware’s direct feedback (i.e., accuracy and compression rate of occupied crossbars). The function directs the search of the pruning ratio of each layer for a global optimum considering the characteristics of individual layers of DNN models. Then AUTO-PRUNE maps the pruned weight matrices to crossbars to store only nontrivial elements. Finally, to avoid the dislocation problem, we design a new data-path in ReRAM-based accelerators to correctly index and feed input to matrix-vector computation leveraging the mechanism of operation units. Experimental results show that, compared to the state-of-the-art work, AUTO-PRUNE achieves up to 3.3X compression rate, 3.1X area efficiency, and 3.3X energy efficiency with a similar or even higher accuracy.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要