MnasNet: Platform-Aware Neural Architecture Search for Mobile

    CVPR, Volume abs/1807.11626, 2019, Pages 2820-2828.

    Cited by: 341|Bibtex|Views102|Links
    EI
    Keywords:
    cnn modeltrade offmobile modelobject detectionneural networkMore(14+)
    Wei bo:
    This paper presents an automated neural architecture search approach for designing resource-efficient mobile Convolutional neural networks models using reinforcement learning

    Abstract:

    Designing convolutional neural networks (CNN) models for mobile devices is challenging because mobile models need to be small and fast, yet still accurate. Although significant effort has been dedicated to design and improve mobile models on all three dimensions, it is challenging to manually balance these trade-offs when there are so man...More

    Code:

    Data:

    Introduction
    • Convolutional neural networks (CNN) have made significant progress in image classification, object detection, and many other applications.
    • As modern CNN models become increasingly deeper and larger [31, 13, 36, 26], they become slower, and require more computation.
    • Such increases in computational demands make it difficult to deploy stateof-the-art CNN models on resource-constrained platforms Controller.
    Highlights
    • Convolutional neural networks (CNN) have made significant progress in image classification, object detection, and many other applications
    • We propose an automated neural architecture search approach for designing mobile Convolutional neural networks models
    • In contrast to previous approaches, we introduce a novel factorized hierarchical search space that factorizes a Convolutional neural networks model into unique blocks and searches for the operations and connections per block separately, allowing different layer architectures in different blocks
    • This paper presents an automated neural architecture search approach for designing resource-efficient mobile Convolutional neural networks models using reinforcement learning
    • Our main ideas are incorporating platform-aware real-world latency information into the search process and utilizing a novel factorized hierarchical search space to search for mobile models with the best trade-offs between accuracy and latency
    • On the ImageNet classification task, our MnasNet achieves 75.2% top-1 accuracy with 78ms latency on a Pixel phone, which is 1.8× faster than MobileNetV2 [29] with 0.5% higher accuracy and 2.3× faster than NASNet [36] with 1.2% higher accuracy
    • We demonstrate that our approach can automatically find significantly better mobile models than existing approaches, and achieve new state-of-the-art results on both ImageNet classification and COCO object detection under typical mobile inference latency constraints
    Methods
    • Searching for CNN models on large tasks like ImageNet or COCO is expensive, as each model takes days to converge.
    • The authors directly perform the architecture search on the ImageNet training set but with fewer training steps (5 epochs).
    • The authors' controller samples about 8K models during architecture search, but only 15 top-performing models are transferred to the full ImageNet and only 1 model is transferred to COCO
    Results
    • The authors study the performance of the models on ImageNet classification and COCO object detection, and compare them with other state-of-the-art mobile models.

      6.1.
    • The authors study the performance of the models on ImageNet classification and COCO object detection, and compare them with other state-of-the-art mobile models.
    • Table 1 shows the performance of the models on ImageNet [28].
    • The authors set the target latency as T = 75ms, similar.
    • 30 40 50 60 Inference Latency.
    • (b) Input size = 96, 128, 160, 192, 224, corresponding to points from left to right.
    Conclusion
    • This paper presents an automated neural architecture search approach for designing resource-efficient mobile CNN models using reinforcement learning.
    • The authors' main ideas are incorporating platform-aware real-world latency information into the search process and utilizing a novel factorized hierarchical search space to search for mobile models with the best trade-offs between accuracy and latency.
    • The resulting MnasNet architecture provides interesting findings on the importance of layer diversity, which will guide them in designing and improving future mobile CNN models.
    Summary
    • Introduction:

      Convolutional neural networks (CNN) have made significant progress in image classification, object detection, and many other applications.
    • As modern CNN models become increasingly deeper and larger [31, 13, 36, 26], they become slower, and require more computation.
    • Such increases in computational demands make it difficult to deploy stateof-the-art CNN models on resource-constrained platforms Controller.
    • Methods:

      Searching for CNN models on large tasks like ImageNet or COCO is expensive, as each model takes days to converge.
    • The authors directly perform the architecture search on the ImageNet training set but with fewer training steps (5 epochs).
    • The authors' controller samples about 8K models during architecture search, but only 15 top-performing models are transferred to the full ImageNet and only 1 model is transferred to COCO
    • Results:

      The authors study the performance of the models on ImageNet classification and COCO object detection, and compare them with other state-of-the-art mobile models.

      6.1.
    • The authors study the performance of the models on ImageNet classification and COCO object detection, and compare them with other state-of-the-art mobile models.
    • Table 1 shows the performance of the models on ImageNet [28].
    • The authors set the target latency as T = 75ms, similar.
    • 30 40 50 60 Inference Latency.
    • (b) Input size = 96, 128, 160, 192, 224, corresponding to points from left to right.
    • Conclusion:

      This paper presents an automated neural architecture search approach for designing resource-efficient mobile CNN models using reinforcement learning.
    • The authors' main ideas are incorporating platform-aware real-world latency information into the search process and utilizing a novel factorized hierarchical search space to search for mobile models with the best trade-offs between accuracy and latency.
    • The resulting MnasNet architecture provides interesting findings on the importance of layer diversity, which will guide them in designing and improving future mobile CNN models.
    Tables
    • Table1: Performance Results on ImageNet Classification [<a class="ref-link" id="c28" href="#r28">28</a>]. We compare our MnasNet models with both manuallydesigned mobile models and other automated approaches – MnasNet-A1 is our baseline model;MnasNet-A2 and MnasNet-A3 are two models (for comparison) with different latency from the same architecture search experiment; #Params: number of trainable parameters; #Mult-Adds: number of multiply-add operations per image; Top-1/5 Acc.: the top-1 or top-5 accuracy on ImageNet validation set; Inference Latency is measured on the big CPU core of a Pixel 1 Phone with batch size 1
    • Table2: Performance Study for Squeeze-and-Excitation SE [<a class="ref-link" id="c13" href="#r13">13</a>] – MnasNet-A denote the default MnasNet with SE in search space; MnasNet-B denote MnasNet with no SE in search space
    • Table3: Performance Results on COCO Object Detection – #Params: number of trainable parameters; #Mult-Adds: number of multiply-additions per image; mAP : standard mean average precision on test-dev2017; mAPS, mAPM , mAPL: mean average precision on small, medium, large objects; Inference Latency: the inference latency on Pixel 1 Phone
    • Table4: Model Scaling vs. Model Search – MobileNetV2 (0.35x) and MnasNet-A1 (0.35x) denote scaling the baseline models with depth multiplier 0.35; MnasNet-search1/2 denotes models from a new architecture search that targets 22ms latency constraint
    • Table5: Comparison of Decoupled Search Space and Reward Design – Multi-obj denotes our multi-objective reward; Single-obj denotes only optimizing accuracy
    • Table6: Performance Comparison of MnasNet and Its Variants – MnasNet-A1 denotes the model shown in Figure 7(a); others are variants that repeat a single type of layer throughout the network. All models have the same number of layers and same filter size at each layer
    Download tables as Excel
    Related work
    • Improving the resource efficiency of CNN models has been an active research topic during the last several years. Some commonly-used approaches include 1) quantizing the weights and/or activations of a baseline CNN model into lower-bit representations [8, 16], or 2) pruning less important filters according to FLOPs [6, 10], or to platform-aware metrics such as latency introduced in [32]. However, these methods are tied to a baseline model and do not focus on learning novel compositions of CNN operations.

      Another common approach is to directly hand-craft more efficient mobile architectures: SqueezeNet [15] reduces the number of parameters and computation by using lowercost 1x1 convolutions and reducing filter sizes; MobileNet [11] extensively employs depthwise separable convolution to minimize computation density; ShuffleNets [33, 24] utilize low-cost group convolution and channel shuffle; Condensenet [14] learns to connect group convolutions across layers; Recently, MobileNetV2 [29] achieved state-of-theart results among mobile-size models by using resourceefficient inverted residuals and linear bottlenecks. Unfortunately, given the potentially huge design space, these handcrafted models usually take significant human efforts.

      Recently, there has been growing interest in automating the model design process using neural architecture search. These approaches are mainly based on reinforcement learning [35, 36, 1, 19, 25], evolutionary search [26], differentiable search [21], or other learning algorithms [19, 17, 23]. Although these methods can generate mobile-size models by repeatedly stacking a few searched cells, they do not incorporate mobile platform constraints into the search process or search space. Closely related to our work is MONAS [12], DPP-Net [3], RNAS [34] and Pareto-NASH [4] which attempt to optimize multiple objectives, such as model size and accuracy, while searching for CNNs, but their search process optimizes on small tasks like CIFAR. In contrast, this paper targets real-world mobile latency constraints and focuses on larger tasks like ImageNet classification and COCO object detection.
    Reference
    • B. Baker, O. Gupta, N. Naik, and R. Raskar. Designing neural network architectures using reinforcement learning. ICLR, 2017.
      Google ScholarLocate open access versionFindings
    • K. Deb. Multi-objective optimization. Search methodologies, pages 403–449, 2014.
      Google ScholarFindings
    • J.-D. Dong, A.-C. Cheng, D.-C. Juan, W. Wei, and M. Sun. DPP-Net: Device-aware progressive search for paretooptimal neural architectures. ECCV, 2018.
      Google ScholarLocate open access versionFindings
    • T. Elsken, J. H. Metzen, and F. Hutter. Multi-objective architecture search for cnns. arXiv preprint arXiv:1804.09081, 2018.
      Findings
    • A. Gholami, K. Kwon, B. Wu, Z. Tai, X. Yue, P. Jin, S. Zhao, and K. Keutzer. Squeezenext: Hardware-aware neural network design. ECV Workshop at CVPR, 2018.
      Google ScholarLocate open access versionFindings
    • A. Gordon, E. Eban, O. Nachum, B. Chen, H. Wu, T.-J. Yang, and E. Choi. Morphnet: Fast & simple resourceconstrained structure learning of deep networks. CVPR, 2018.
      Google ScholarLocate open access versionFindings
    • P. Goyal, P. Dollar, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Jia, and K. He. Accurate, large minibatch sgd: training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017.
      Findings
    • S. Han, H. Mao, and W. J. Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. ICLR, 2016.
      Google ScholarLocate open access versionFindings
    • K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. CVPR, pages 770–778, 2016.
      Google ScholarLocate open access versionFindings
    • Y. He, J. Lin, Z. Liu, H. Wang, L.-J. Li, and S. Han. Amc: Automl for model compression and acceleration on mobile devices. ECCV, 2018.
      Google ScholarLocate open access versionFindings
    • A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
      Findings
    • C.-H. Hsu, S.-H. Chang, D.-C. Juan, J.-Y. Pan, Y.-T. Chen, W. Wei, and S.-C. Chang. MONAS: Multi-objective neural architecture search using reinforcement learning. arXiv preprint arXiv:1806.10332, 2018.
      Findings
    • J. Hu, L. Shen, and G. Sun. Squeeze-and-excitation networks. CVPR, 2018.
      Google ScholarLocate open access versionFindings
    • G. Huang, S. Liu, L. van der Maaten, and K. Q. Weinberger. Condensenet: An efficient densenet using learned group convolutions. CVPR, 2018.
      Google ScholarLocate open access versionFindings
    • F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and ¡0.5 mb model size. arXiv preprint arXiv:1602.07360, 2016.
      Findings
    • B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko. Quantization and training of neural networks for efficient integer-arithmetic-only inference. CVPR, 2018.
      Google ScholarLocate open access versionFindings
    • K. Kandasamy, W. Neiswanger, J. Schneider, B. Poczos, and E. Xing. Neural architecture search with bayesian optimisation and optimal transport. NeurIPS, 2018.
      Google ScholarLocate open access versionFindings
    • T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick. Microsoft COCO: Common objects in context. ECCV, 2014.
      Google ScholarLocate open access versionFindings
    • C. Liu, B. Zoph, J. Shlens, W. Hua, L.-J. Li, L. Fei-Fei, A. Yuille, J. Huang, and K. Murphy. Progressive neural architecture search. ECCV, 2018.
      Google ScholarLocate open access versionFindings
    • H. Liu, K. Simonyan, O. Vinyals, C. Fernando, and K. Kavukcuoglu. Hierarchical representations for efficient architecture search. ICLR, 2018.
      Google ScholarLocate open access versionFindings
    • H. Liu, K. Simonyan, and Y. Yang. DARTS: Differentiable architecture search. ICLR, 2019.
      Google ScholarLocate open access versionFindings
    • W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.Y. Fu, and A. C. Berg. SSD: Single shot multibox detector. ECCV, 2016.
      Google ScholarLocate open access versionFindings
    • R. Luo, F. Tian, T. Qin, and T.-Y. Liu. Neural architecture optimization. NeurIPS, 2018.
      Google ScholarLocate open access versionFindings
    • N. Ma, X. Zhang, H.-T. Zheng, and J. Sun. Shufflenet v2: Practical guidelines for efficient cnn architecture design. ECCV, 2018.
      Google ScholarLocate open access versionFindings
    • H. Pham, M. Y. Guan, B. Zoph, Q. V. Le, and J. Dean. Efficient neural architecture search via parameter sharing. ICML, 2018.
      Google ScholarLocate open access versionFindings
    • E. Real, A. Aggarwal, Y. Huang, and Q. V. Le. Regularized evolution for image classifier architecture search. AAAI, 2019.
      Google ScholarLocate open access versionFindings
    • J. Redmon and A. Farhadi. Yolo9000: better, faster, stronger. CVPR, 2017.
      Google ScholarLocate open access versionFindings
    • O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3):211–252, 2015.
      Google ScholarLocate open access versionFindings
    • M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. CVPR, 2018.
      Google ScholarLocate open access versionFindings
    • J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
      Findings
    • C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi. Inception-v4, inception-resnet and the impact of residual connections on learning. AAAI, 4:12, 2017.
      Google ScholarLocate open access versionFindings
    • T.-J. Yang, A. Howard, B. Chen, X. Zhang, A. Go, V. Sze, and H. Adam. Netadapt: Platform-aware neural network adaptation for mobile applications. ECCV, 2018.
      Google ScholarLocate open access versionFindings
    • X. Zhang, X. Zhou, M. Lin, and J. Sun. Shufflenet: An extremely efficient convolutional neural network for mobile devices. CVPR, 2018.
      Google ScholarLocate open access versionFindings
    • Y. Zhou, S. Ebrahimi, S. O. Arık, H. Yu, H. Liu, and G. Diamos. Resource-efficient neural architect. arXiv preprint arXiv:1806.07912, 2018.
      Findings
    • B. Zoph and Q. V. Le. Neural architecture search with reinforcement learning. ICLR, 2017.
      Google ScholarLocate open access versionFindings
    • B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le. Learning transferable architectures for scalable image recognition. CVPR, 2018.
      Google ScholarLocate open access versionFindings
    Your rating :
    0

     

    Tags
    Comments