Feature Map Alignment - Towards Efficient Design of Mixed-precision Quantization Scheme.

VCIP(2019)

引用 1|浏览116
暂无评分
摘要
Quantization is known as an effective compression method for deploying neural networks on mobile devices. However, most existing works train from scratch a quantized network with universal bitwidth for all layers, making it hard to find the optimal trade-off between compression ratio and inference accuracy. In this paper, we propose a novel post-training quantization approach which derives a flexible bitwidth scheme. Our algorithm progressively downgrades bitwidth of chosen layer in the network and performs feature map alignment with pre-trained model. The algorithm comprises a meter of layer sensitivity and an iterative quantizer. Specifically, the meter dynamically estimates for every layer the error of quantization on its output feature map, meanwhile the error serves as an objective function to be minimized by the quantizer. Extensive experiments on CIFAR-10 and ImageNet ILSVRC2012 datasets demonstrate that the proposed approach achieves impressive results for mainstream neural networks.
更多
查看译文
关键词
mobile multimedia,compression,quantization,flexible bitwidth
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要