CMQ: Crossbar-Aware Neural Network Mixed-Precision Quantization via Differentiable Architecture Search

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems(2022)

引用 4|浏览11
暂无评分
摘要
The RRAM-based accelerators have become very popular candidates for neural network acceleration due to they perform matrix-vector multiplication in-memory with high storage density and low latency. Many related works have used fixed-precision quantization to achieve the model compression and enhance the tolerance of process variation, but these methods still suffer a large accuracy degradation and poor robustness to nonideal effects. In this work, we propose a crossbar-aware mixed-precision quantization scheme, which enables to search for the optimal precision of each part of the network as a way to improve the accuracy and robustness to noise. First, we introduce a group quantization strategy that can flexibly adjust the group size dynamically according to the crossbar size. Then, we propose a detailed mixed-precision search flow to search for the optimal precision set of the network. Finally, we give a noise injection adaption training method to enhance the tolerance of noise. Experimental results show that our proposed method can improve inference accuracy by at least 2.04% compared to the fixed-precision quantization under the same resource cost. The searched architecture with the highest accuracy most accurate (MA) could achieve an accuracy of 92.39% and a resource saving of 93.30% compared to the full precision model. The searched architecture with the biggest resource savings most efficient (ME) could achieve an accuracy of 91.11% and a resource saving of 95.57% compared to the full precision model. And, the average precision of the ME mixed-precision architecture is only 1.4 bits. Besides, the results show the mixed-precision network with noise adaption training is more robust to noise than the fixed-precision network with noise adaption training.
更多
查看译文
关键词
Deep neural network (DNN) inference accelerator,mixed-precision quantization,neural architecture search,RRAM
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要