MPQ-YOLO: Ultra low mixed-precision quantization of YOLO for edge devices deployment

Xinyu Liu,Tao Wang,Jiaming Yang,Chenwei Tang,Jiancheng Lv

NEUROCOMPUTING（2024）

引用 0|浏览3

暂无评分

摘要

You Only Look Once (YOLO), known for its real-time performance and outstanding accuracy, has emerged as a prominent framework for object detection tasks. However, deploying YOLO on resource -constrained edge devices poses challenges due to its substantial memory requirements. In this paper, we propose MPQ-YOLO, an ultra -low mixed -precision quantization framework designed for edge device deployment. The core idea is to integrate 1 -bit Backbone quantization and 4 -bit Head quantization with dedicated training techniques. Specifically, we analyze the effect of numerical distribution on the performance of binary neural networks (BNNs), and based on this, we design a backbone with only 1 -bit convolution. Then, we introduce a trainable scale and Progressive Network Quantization (PNQ) training strategy to bridge the Backbone and Head for end -to -end quantization training. The former is applied to both weights and activations within the 4 -bit Head, enabling effective gradient propagation. The latter mitigates oscillation caused by mixed precision training, promoting smoother training and faster model convergence. Extensive experiments on VOC and COCO datasets demonstrate that MPQ-YOLO achieves a good trade-off between model compression and detection performance. Specifically, compared to the full -precision model, MPQ-YOLO achieves compression of up to 16.3x and 14.2x in terms of computational complexity and model size, respectively, while maintaining relatively high detection accuracy, i.e., 74.7% on VOC and 51.5% on COCO. To the best of our knowledge, MPQ-YOLO is the first YOLO framework with dual low mixed -precision quantization. Moreover, compared to the existing layer -wise mixed -precision quantization methods which cause redundant data processing and massive data movement, MPQ-YOLO offers a more hardware -designer -friendly and straightforward solution through efficient resource utilization and reuse.

查看译文

关键词

Object detection,Model compression,Mixed-precision quantization,Binary neural network,Quantization aware training

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要