CSMPQ: Class Separability Based Mixed-Precision Quantization.

ICIC (1)(2023)

引用 1|浏览1
暂无评分
摘要
Network quantization has become increasingly popular due to its ability to reduce storage requirements and accelerate inference time. However, However, ultra low-bit quantization is still challenging due to significant performance degradation. Mixed-precision quantization has been introduced as a solution to achieve speedup while maintaining accuracy as much as possible by quantizing different bits for different layers. However, existing methods either focus on the sensitivity of different network layers, neglecting the intrinsic attribute of activations, or require a reinforcement learning and neural architecture search process to obtain the optimal bit-width configuration, which is time-consuming. To address these limitations, we propose a new mixed-precision quantization method based on the class separability of layer-wise feature maps. Specifically, we extend the widely-used term frequency-inverse document frequency (TF-IDF) to measure the class separability of layer-wise feature maps. We identify that the layers with lower class separability can be quantized to lower bits. Furthermore, we design a linear programming problem to derive the optimal bit configuration. Without any iterative process, our proposed method, CSMPQ, achieves better compression trade-offs than state-of-the-art quantization algorithms. Specifically, for Quantization-Aware Training, we achieve Top-1 accuracy of 73.03% on ResNet-18 with only 63GBOPs, and Top-1 accuracy of 71.30% with 1.5 Mb on MobileNetV2 for Post-Training Quantization.
更多
查看译文
关键词
class separability,mixed-precision
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要