Quantization-aware Optimization Approach for CNNs Inference on CPUs

Jiasong Chen, Zeming Xie, Weipeng Liang,Bosheng Liu,Xin Zheng,Jigang Wu,Xiaoming Xiong

2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)(2024)

引用 0|浏览0
暂无评分
摘要
Data movements through the memory hierarchy are a fundamental bottleneck in the majority of convolutional neural network (CNN) deployments on CPUs. Loop-level optimization and hybrid bitwidth quantization are two representative optimization approaches for memory access reduction. However, they were carried out independently because of the significantly increased complexity of design space exploration. We present QAOpt, a quantization-aware optimization approach that can reduce the high complexity when combining both for CNN deployments on CPUs. We develop a bitwidth-sensitive quantization strategy that can perform the trade-off between model accuracy and data movements when deploying both loop-level optimization and mixed precision quantization. Also, we provide a quantization-aware pruning process that can reduce the design space for high efficiency. Evaluation results demonstrate that our work can achieve better energy efficiency under acceptable accuracy loss.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要