Training Quantized Neural Networks With a Full-Precision Auxiliary Module

    CVPR 2020, 2020.

    Cited by: 0|Bibtex|Views38|Links
    Keywords:
    precision auxiliary modulestochastic gradient descentneural architecture searchauxiliary moduleKnowledge distillationMore(13+)
    Wei bo:
    The auxiliary module is combined with the lowprecision network to form a mix-precision network, which is jointly optimized with the low-precision model

    Abstract:

    In this paper, we seek to tackle a challenge in training low-precision networks: the notorious difficulty in propagating gradient through a low-precision network due to the non-differentiable quantization function. We propose a solution by training the low-precision network with a fullprecision auxiliary module. Specifically, during train...More

    Code:

    Data:

    0
    Introduction
    • Deep neural networks (DNNs) have made great strides in many computer vision tasks such as image classification [11, 20], segmentation [8, 10] and detection [37, 40].
    • The authors cannot directly optimize the discretised network with
    Highlights
    • Deep neural networks (DNNs) have made great strides in many computer vision tasks such as image classification [11, 20], segmentation [8, 10] and detection [37, 40]
    • Through extensive experiments on the COCO benchmark, we show that our 4-bit models can achieve near lossless performance comparing to the full-precision model, which has a significant value in practice
    • We have proposed an auxiliary learning strategy to tackle the non-differentiable quantization process in training low-bitwise convolutional neural networks
    • The auxiliary module is combined with the lowprecision network to form a mix-precision network, which is jointly optimized with the low-precision model
    • The full-precision auxiliary module can provide direct hierarchical gradient during back-propagation to assist the optimization of the low-precision network
    • We have conducted extensive experiments based on various quantization approaches and observed consistent performance increase on the image classification and object detection
    Methods
    • The authors describe the proposed learning strategy for training a low-precision network.
    • When the authors only quantize the backbone network, the authors don’t observe AP drop at least on ResNet-18.
    • This justifies that 4-bit backbone can encode accurate features for further decoding.
    • The authors incorporate the proposed auxiliary learning strategy to assist the convergence of quantized detector.
    • Each head should learn independent parameters to transform the corresponding level feature for classification and regression
    Conclusion
    • The authors have proposed an auxiliary learning strategy to tackle the non-differentiable quantization process in training low-bitwise convolutional neural networks.
    • The auxiliary module is combined with the lowprecision network to form a mix-precision network, which is jointly optimized with the low-precision model.
    • In this way, the full-precision auxiliary module can provide direct hierarchical gradient during back-propagation to assist the optimization of the low-precision network.
    • The authors have worked on quantized object detection and proposed several practical solutions.
    • The authors have achieved near lossless results using 4-bit detectors
    Summary
    • Introduction:

      Deep neural networks (DNNs) have made great strides in many computer vision tasks such as image classification [11, 20], segmentation [8, 10] and detection [37, 40].
    • The authors cannot directly optimize the discretised network with
    • Methods:

      The authors describe the proposed learning strategy for training a low-precision network.
    • When the authors only quantize the backbone network, the authors don’t observe AP drop at least on ResNet-18.
    • This justifies that 4-bit backbone can encode accurate features for further decoding.
    • The authors incorporate the proposed auxiliary learning strategy to assist the convergence of quantized detector.
    • Each head should learn independent parameters to transform the corresponding level feature for classification and regression
    • Conclusion:

      The authors have proposed an auxiliary learning strategy to tackle the non-differentiable quantization process in training low-bitwise convolutional neural networks.
    • The auxiliary module is combined with the lowprecision network to form a mix-precision network, which is jointly optimized with the low-precision model.
    • In this way, the full-precision auxiliary module can provide direct hierarchical gradient during back-propagation to assist the optimization of the low-precision network.
    • The authors have worked on quantized object detection and proposed several practical solutions.
    • The authors have achieved near lossless results using 4-bit detectors
    Tables
    • Table1: Accuracy (%) of different comparing methods on the ImageNet validation set
    • Table2: Accuracy (%) of different supervision strategies on the ImageNet validation set based on 2-bit DoReFa-Net on ResNet18, ResNet-34 and ResNet-50
    • Table3: Accuracy (%) of the proposed approaches on the ImageNet validation set. All the cases are 2-bit and without skip connections except for the baselines. We can observe that the auxiliary module can significantly improve the plain network performance
    • Table4: Accuracy (%) of 2-bit DoReFa-Net using ResNet-18 on the CIFAR-100 dataset
    • Table5: Accuracy (%) of using different adaptors. We use DoReFa-Net on ImageNet as our baseline
    • Table6: Performance on the COCO validation set with 4-bit quantization
    • Table7: Ablation studies on the COCO validation set with 4-bit quantization
    Download tables as Excel
    Related work
    • Network quantization. Quantized network represents the weights and activations with very low precision, thus yielding highly compact DNN models compared to their floating-point counterparts. Moreover, the convolution operations can be efficiently computed via bitwise operations. Quantization can be categorized into fixed-point quantization and binary neural networks (BNNs), in which fixedpoint quantization can also be divided into uniform and nonuniform. Uniform approaches [17,54,57] design quantizers with a constant quantization step. To reduce the quantization error, non-uniform strategies [4, 52] propose to learn the quantization intervals by jointly optimizing parameters and quantizers. A fundamental problem of quantization is to approximate gradient of the non-differentiable quantizer. To solve this problem, some works have studied relaxed quantization [1, 31, 47, 57]. Moreover, with the popularity of automatic machine learning, some recent literature employs reinforcement learning to search for the optimal bitwidth for each layer [6, 46, 49]. BNNs [14, 36] constrain both weights and activations to binary values (i.e., +1 or −1), which brings great benefits to specialized hardware devices. The development of BNNs can be classified into two categories: (i) a focus on improving the training of BNNs [13, 30, 36, 45]; (ii) multiple binarizations to approximate the full-precision tensor or structure [9, 23, 27, 28, 45, 58]. In this paper, we propose a general auxiliary learning approach that can work on all categories of quantization approaches. Weight sharing. Weight sharing has been attracting increasing attention for efficient, yet accurate computation. In visual recognition, region proposal networks (RPN) in Faster-RCNN [40] and Mask-RCNN [10] share the same backbone with task-specific networks, which greatly saves testing time. For neural architecture search, ENAS [35] allows parameters to be shared among all architectures in the search space, which saves orders of magnitude GPU hours. In the network compression field, weight/activation quantization intends to partition the weight/activation distribution into clusters and use the centers of clusters as the possible discrete values. This strategy can be interpreted as a special case of weight sharing. Different from these approaches, we propose to utilize weight sharing for jointly optimizing the full-precision auxiliary module and the original low-precision network to improve the accuracy of the latter quantized model.
    Funding
    • Tan was in part supported by Guangdong Provincial Scientific and Technological Funds under Grants 2018B010107001
    • This work was in part supported by ARC DP Project ‘Deep learning that scales’
    Reference
    • Yu Bai, Yu-Xiang Wang, and Edo Liberty. Proxquant: Quantized neural networks via proximal operators. In Proc. Int. Conf. Learn. Repren., 2019. 1, 2
      Google ScholarLocate open access versionFindings
    • Yoshua Bengio, Nicholas Leonard, and Aaron Courville. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013. 1
      Findings
    • Joseph Bethge, Marvin Bornstein, Adrian Loy, Haojin Yang, and Christoph Meinel. Training competitive binary neural networks from scratch. arXiv preprint arXiv:1812.01965, 2018. 6
      Findings
    • Zhaowei Cai, Xiaodong He, Jian Sun, and Nuno Vasconcelos. Deep learning with low precision by half-wave gaussian quantization. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pages 5918–5926, 2017. 2
      Google ScholarLocate open access versionFindings
    • Guobin Chen, Wongun Choi, Xiang Yu, Tony Han, and Manmohan Chandraker. Learning efficient object detection models with knowledge distillation. In Proc. Adv. Neural Inf. Process. Syst., pages 742–751, 2017. 2
      Google ScholarLocate open access versionFindings
    • Yukang Chen, Gaofeng Meng, Qian Zhang, Xinbang Zhang, Liangchen Song, Shiming Xiang, and Chunhong Pan. Joint neural architecture search and quantization. arXiv preprint arXiv:1811.09426, 2018. 2
      Findings
    • Ross Girshick. Fast r-cnn. In Proc. IEEE Int. Conf. Comp. Vis., pages 1440–1448, 2015. 2
      Google ScholarLocate open access versionFindings
    • Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pages 580–587, 2014. 1, 2
      Google ScholarLocate open access versionFindings
    • Yiwen Guo, Anbang Yao, Hao Zhao, and Yurong Chen. Network sketching: Exploiting binary structure in deep cnns. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pages 5955– 5963, 2017. 2
      Google ScholarLocate open access versionFindings
    • Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross Girshick. Mask r-cnn. In Proc. IEEE Int. Conf. Comp. Vis., pages 2980–2988, 2017. 1, 2
      Google ScholarLocate open access versionFindings
    • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pages 770–778, 2016. 1, 3, 6
      Google ScholarLocate open access versionFindings
    • Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. In Proc. Adv. Neural Inf. Process. Syst. Workshops, 2014. 2
      Google ScholarLocate open access versionFindings
    • Lu Hou, Quanming Yao, and James T Kwok. Loss-aware binarization of deep networks. In Proc. Int. Conf. Learn. Repren., 2017. 2
      Google ScholarLocate open access versionFindings
    • Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Binarized neural networks. In Proc. Adv. Neural Inf. Process. Syst., pages 4107–4115, 2016. 2, 5
      Google ScholarLocate open access versionFindings
    • Seung Hyun Lee, Dae Ha Kim, and Byung Cheol Song. Selfsupervised knowledge distillation using singular value decomposition. In Proc. Eur. Conf. Comp. Vis., 2018. 2
      Google ScholarLocate open access versionFindings
    • Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pages 2704–2713, 2018. 1
      Google ScholarLocate open access versionFindings
    • Sangil Jung, Changyong Son, Seohyung Lee, Jinwoo Son, Jae-Joon Han, Youngjun Kwak, Sung Ju Hwang, and Changkyu Choi. Learning to quantize deep networks by optimizing quantization intervals with task loss. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pages 4350–4359, 2019. 2, 5, 6, 8
      Google ScholarLocate open access versionFindings
    • Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In Proc. Int. Conf. Learn. Repren., 2015. 6
      Google ScholarLocate open access versionFindings
    • Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. 2009. 5
      Google ScholarFindings
    • Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Proc. Adv. Neural Inf. Process. Syst., pages 1097– 1105, 2012. 1
      Google ScholarLocate open access versionFindings
    • Chen-Yu Lee, Saining Xie, Patrick Gallagher, Zhengyou Zhang, and Zhuowen Tu. Deeply-supervised nets. In Artificial Intelligence and Statistics, pages 562–570, 2015. 2
      Google ScholarLocate open access versionFindings
    • Rundong Li, Yan Wang, Feng Liang, Hongwei Qin, Junjie Yan, and Rui Fan. Fully quantized network for object detection. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pages 2810–2819, 2019. 1, 5, 7, 8
      Google ScholarLocate open access versionFindings
    • Zefan Li, Bingbing Ni, Wenjun Zhang, Xiaokang Yang, and Wen Gao. Performance guaranteed network acceleration via high-order residual quantization. In Proc. IEEE Int. Conf. Comp. Vis., pages 2584–2592, 2017. 2
      Google ScholarLocate open access versionFindings
    • Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature pyramid networks for object detection. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pages 2117–2125, 2017. 5, 7
      Google ScholarLocate open access versionFindings
    • Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollar. Focal loss for dense object detection. In Proc. IEEE Int. Conf. Comp. Vis., pages 2980–2988, 2017. 2, 5, 7
      Google ScholarLocate open access versionFindings
    • Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In Proc. Eur. Conf. Comp. Vis., pages 740–755, 2014. 7
      Google ScholarLocate open access versionFindings
    • Xiaofan Lin, Cong Zhao, and Wei Pan. Towards accurate binary convolutional neural network. In Proc. Adv. Neural Inf. Process. Syst., pages 344–352, 2017. 2
      Google ScholarLocate open access versionFindings
    • Chunlei Liu, Wenrui Ding, Xin Xia, Baochang Zhang, Jiaxin Gu, Jianzhuang Liu, Rongrong Ji, and David Doermann. Circulant binary convolutional networks: Enhancing the performance of 1-bit dcnns with circulant back propagation. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pages 2691– 2699, 2019. 2
      Google ScholarLocate open access versionFindings
    • Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Proc. Eur. Conf. Comp. Vis., pages 21–37, 2016. 2
      Google ScholarLocate open access versionFindings
    • Zechun Liu, Baoyuan Wu, Wenhan Luo, Xin Yang, Wei Liu, and Kwang-Ting Cheng. Bi-real net: Enhancing the performance of 1-bit cnns with improved representational capability and advanced training algorithm. In Proc. Eur. Conf. Comp. Vis., pages 722–737, 2018. 2, 5, 6
      Google ScholarLocate open access versionFindings
    • Christos Louizos, Matthias Reisser, Tijmen Blankevoort, Efstratios Gavves, and Max Welling. Relaxed quantization for discretized neural networks. In Proc. Int. Conf. Learn. Repren., 2019. 1, 2, 6
      Google ScholarLocate open access versionFindings
    • Asit Mishra and Debbie Marr. Apprentice: Using knowledge distillation techniques to improve low-precision network accuracy. In Proc. Int. Conf. Learn. Repren., 2018. 1, 4, 6
      Google ScholarLocate open access versionFindings
    • Vladimir Nekrasov, Hao Chen, Chunhua Shen, and Ian Reid. Fast neural architecture search of compact semantic segmentation models via auxiliary cells. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pages 9126–9135, 2019. 2
      Google ScholarLocate open access versionFindings
    • Wonpyo Park, Dongju Kim, Yan Lu, and Minsu Cho. Relational knowledge distillation. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2019. 2
      Google ScholarLocate open access versionFindings
    • Hieu Pham, Melody Y Guan, Barret Zoph, Quoc V Le, and Jeff Dean. Efficient neural architecture search via parameter sharing. In Proc. Int. Conf. Mach. Learn., pages 4092–4101, 2018. 2
      Google ScholarLocate open access versionFindings
    • Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. Xnor-net: Imagenet classification using binary convolutional neural networks. In Proc. Eur. Conf. Comp. Vis., pages 525–542, 2016. 2
      Google ScholarLocate open access versionFindings
    • Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pages 779–788, 2016. 1, 2
      Google ScholarLocate open access versionFindings
    • Joseph Redmon and Ali Farhadi. Yolo9000: better, faster, stronger. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pages 7263–7271, 2017. 2
      Google ScholarLocate open access versionFindings
    • Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. 2
      Findings
    • Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proc. Adv. Neural Inf. Process. Syst., pages 91–99, 2015. 1, 2
      Google ScholarLocate open access versionFindings
    • Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. Fitnets: Hints for thin deep nets. In Proc. Int. Conf. Learn. Repren., 2015. 2
      Google ScholarLocate open access versionFindings
    • Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. Int. J. Comp. Vis., 115(3):211–252, 2015. 5
      Google ScholarLocate open access versionFindings
    • Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pages 1–9, 2015. 2
      Google ScholarLocate open access versionFindings
    • Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V Le. Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2820–2828, 2019. 2
      Google ScholarLocate open access versionFindings
    • Wei Tang, Gang Hua, and Liang Wang. How to train a compact binary neural network with high accuracy? In Proc. AAAI Conf. on Arti. Intel., pages 2625–2631, 2017. 2
      Google ScholarLocate open access versionFindings
    • Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, and Song Han. Haq: Hardware-aware automated quantization. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2019. 2
      Google ScholarLocate open access versionFindings
    • Peisong Wang, Qinghao Hu, Yifan Zhang, Chunjie Zhang, Yang Liu, Jian Cheng, et al. Two-step quantization for lowbit neural networks. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pages 4376–4384, 2018. 2
      Google ScholarLocate open access versionFindings
    • Yi Wei, Xinyu Pan, Hongwei Qin, Wanli Ouyang, and Junjie Yan. Quantization mimic: Towards very tiny cnn for object detection. In Proc. Eur. Conf. Comp. Vis., pages 267–283, 2018. 1, 2
      Google ScholarLocate open access versionFindings
    • Bichen Wu, Yanghan Wang, Peizhao Zhang, Yuandong Tian, Peter Vajda, and Kurt Keutzer. Mixed precision quantization of convnets via differentiable neural architecture search. arXiv preprint arXiv:1812.00090, 2018. 2
      Findings
    • Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen Lo, and Ross Girshick. Detectron2. https://github.com/facebookresearch/detectron2, 2019.8
      Findings
    • Sergey Zagoruyko and Nikos Komodakis. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. In Proc. Int. Conf. Learn. Repren., 2017. 2
      Google ScholarLocate open access versionFindings
    • Dongqing Zhang, Jiaolong Yang, Dongqiangzi Ye, and Gang Hua. Lq-nets: Learned quantization for highly accurate and compact deep neural networks. In Proc. Eur. Conf. Comp. Vis., pages 365–382, 2018. 2, 5, 6
      Google ScholarLocate open access versionFindings
    • Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. Pyramid scene parsing network. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pages 2881– 2890, 2017. 2
      Google ScholarLocate open access versionFindings
    • Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160, 2016. 2, 5, 6
      Findings
    • Chenzhuo Zhu, Song Han, Huizi Mao, and William J Dally. Trained ternary quantization. Proc. Int. Conf. Learn. Repren., 2017. 5
      Google ScholarLocate open access versionFindings
    • Bohan Zhuang, Jing Liu, Mingkui Tan, Lingqiao Liu, Ian Reid, and Chunhua Shen. Effective training of convolutional neural networks with low-bitwidth weights and activations. arXiv preprint arXiv:1908.04680, 2019. 1, 4, 5, 6
      Findings
    • Bohan Zhuang, Chunhua Shen, Mingkui Tan, Lingqiao Liu, and Ian Reid. Towards effective low-bitwidth convolutional neural networks. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pages 7920–7928, 2018. 1, 2, 4, 5, 6
      Google ScholarLocate open access versionFindings
    • Bohan Zhuang, Chunhua Shen, Mingkui Tan, Lingqiao Liu, and Ian Reid. Structured binary neural network for accurate image classification and semantic segmentation. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pages 413–422, 2019. 2, 5, 6
      Google ScholarLocate open access versionFindings
    Your rating :
    0

     

    Tags
    Comments