NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection

    Computer Vision and Pattern Recognition, pp. 7036-7045, 2019.

    Cited by: 98|Bibtex|Views13|Links
    EI
    Keywords:
    Policy Optimizationbackbone modelpyramidal feature representationdeep residual learningarchitecture searchMore(18+)
    Wei bo:
    We proposed to use Neural Architecture Search to further optimize the process of designing Feature Pyramid Networks for Object Detection

    Abstract:

    Current state-of-the-art convolutional architectures for object detection are manually designed. Here we aim to learn a better architecture of feature pyramid network for object detection. We adopt Neural Architecture Search and discover a new feature pyramid architecture in a novel scalable search space covering all cross-scale connectio...More

    Code:

    Data:

    Introduction
    • Learning visual feature representations is a fundamental problem in computer vision. In the past few years, great progress has been made on designing the model architecture of deep convolutional networks (ConvNets) for image classification [12, 15, 35] and object detection [21, 22].
    • Unlike image classification which predicts class probability for an image, object detection has its own challenge to detect and localize multiple objects across a wide range of scales and locations
    • To address this issue, the pyramidal feature representations, which represent an image with multiscale feature layers, are commonly used by many modern object detectors [11, 23, 26].
    • The high-level features, which are semantically strong but lower resolution, are up-
    Highlights
    • Learning visual feature representations is a fundamental problem in computer vision
    • The pyramidal feature representations, which represent an image with multiscale feature layers, are commonly used by many modern object detectors [11, 23, 26]
    • We aims to discover an atomic architecture that has identical input and output feature levels and can be applied repeatedly
    • In Appendix A, we show NAS-Feature Pyramid Network can be used for anytime detection
    • We proposed to use Neural Architecture Search to further optimize the process of designing Feature Pyramid Networks for Object Detection
    • Our experiments on the COCO dataset showed that the discovered architecture, named NAS-Feature Pyramid Network, is flexible and performant for building accurate detection model
    Methods
    • The authors' method is based on the RetinaNet framework [23] because it is simple and efficient.
    • The RetinaNet framework has two main components: a backbone network and a feature pyramid network (FPN).
    • To discover a better FPN, the authors make use of the Neural Architecture Search framework proposed by [44].
    • Through trial and error the controller learns to generate better architectures over time
    • As it has been identified by previous works [36, 44, 45], the search space plays a crucial role in the success of architecture search
    Results
    • In Figure 8a, the authors show that stacking the vanilla FPN architecture does not always improve performance whereas stacking NAS-FPN improves accuracy significantly.
    Conclusion
    • The authors proposed to use Neural Architecture Search to further optimize the process of designing Feature Pyramid Networks for Object Detection.
    • The authors' experiments on the COCO dataset showed that the discovered architecture, named NAS-FPN, is flexible and performant for building accurate detection model.
    • On a wide range of accuracy and speed tradeoff, NAS-FPN produces significant.
    • 2https://github.com/tensorflow/models/tree/master/research/object detection improvements upon many backbone architectures
    Summary
    • Introduction:

      Learning visual feature representations is a fundamental problem in computer vision. In the past few years, great progress has been made on designing the model architecture of deep convolutional networks (ConvNets) for image classification [12, 15, 35] and object detection [21, 22].
    • Unlike image classification which predicts class probability for an image, object detection has its own challenge to detect and localize multiple objects across a wide range of scales and locations
    • To address this issue, the pyramidal feature representations, which represent an image with multiscale feature layers, are commonly used by many modern object detectors [11, 23, 26].
    • The high-level features, which are semantically strong but lower resolution, are up-
    • Methods:

      The authors' method is based on the RetinaNet framework [23] because it is simple and efficient.
    • The RetinaNet framework has two main components: a backbone network and a feature pyramid network (FPN).
    • To discover a better FPN, the authors make use of the Neural Architecture Search framework proposed by [44].
    • Through trial and error the controller learns to generate better architectures over time
    • As it has been identified by previous works [36, 44, 45], the search space plays a crucial role in the success of architecture search
    • Results:

      In Figure 8a, the authors show that stacking the vanilla FPN architecture does not always improve performance whereas stacking NAS-FPN improves accuracy significantly.
    • Conclusion:

      The authors proposed to use Neural Architecture Search to further optimize the process of designing Feature Pyramid Networks for Object Detection.
    • The authors' experiments on the COCO dataset showed that the discovered architecture, named NAS-FPN, is flexible and performant for building accurate detection model.
    • On a wide range of accuracy and speed tradeoff, NAS-FPN produces significant.
    • 2https://github.com/tensorflow/models/tree/master/research/object detection improvements upon many backbone architectures
    Tables
    • Table1: Performance of RetinaNet with NAS-FPN and other state-of-the-art detectors on test-dev set of COCO
    Download tables as Excel
    Funding
    • In Figure 8a, we show that stacking the vanilla FPN architecture does not always improve performance whereas stacking NAS-FPN improves accuracy significantly
    Reference
    • E. H. Adelson, C. H. Anderson, J. R. Bergen, P. J. Burt, and J. M. Ogden. Pyramid methods in image processinh. RCA engineer, 1984. 2
      Google ScholarFindings
    • B. Baker, O. Gupta, N. Naik, and R. Raskar. Designing neural network architectures using reinforcement learning. In ICLR, 2016. 2
      Google ScholarLocate open access versionFindings
    • T. Bolukbasi, J. Wang, O. Dekel, and V. Saligrama. Adaptive neural networks for efficient inference. In ICML, 2017. 2
      Google ScholarLocate open access versionFindings
    • L.-C. Chen, M. D. Collins, Y. Zhu, G. Papandreou, B. Zoph, F. Schroff, H. Adam, and J. Shlens. Searching for efficient multi-scale architectures for dense image prediction. In NIPS, 2018. 2
      Google ScholarLocate open access versionFindings
    • R. J. L.-S. D. Ooro-Rubio, M. Niepert. Learning short-cut connections for object counting. BMVC, 2018. 2
      Google ScholarFindings
    • T. Elsken, J. H. Metzen, and F. Hutter. Neural architecture search: A survey. arXiv preprint arXiv:1808.05377, 2018. 2
      Findings
    • C. Fu, W. Liu, A. Ranga, A. Tyagi, and A. C. Berg. DSSD: Deconvolutional single shot detector. CoRR, abs/1701.06659, 2011
      Findings
    • G. Ghiasi and C. C. Fowlkes. Laplacian pyramid reconstruction and refinement for semantic segmentation. In ECCV, 2016. 2
      Google ScholarLocate open access versionFindings
    • G. Ghiasi, T. Lin, and Q. V. Le. DropBlock: A regularization method for convolutional networks. NIPS, 2018. 4, 6, 8
      Google ScholarLocate open access versionFindings
    • R. Girshick, I. Radosavovic, G. Gkioxari, P. Dollar, and K. He. Detectron. https://github.com/facebookresearch/detectron, 2018.1, 2
      Findings
    • K. He, G. Gkioxari, P. Dollar, and R. Girshick. Mask RCNN. In ICCV, 2017. 1, 2, 8
      Google ScholarLocate open access versionFindings
    • K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016. 1, 2
      Google ScholarLocate open access versionFindings
    • G. Huang, D. Chen, T. Li, F. Wu, L. van der Maaten, and K. Weinberger. Multi-scale dense networks for resource efficient image classification. In ICLR, 2018. 4
      Google ScholarLocate open access versionFindings
    • G. Huang, D. Chen, T. Li, F. Wu, L. van der Maaten, and K. Q. Weinberger. Multi-scale dense networks for resource efficient image classification. In ICLR, 2017. 2
      Google ScholarLocate open access versionFindings
    • G. Huang, Z. Liu, and K. Q. Weinberger. Densely connected convolutional networks. In CVPR, 2017. 1
      Google ScholarLocate open access versionFindings
    • T. Kong, F. Sun, W. Huang, and H. Liu. Deep feature pyramid reconfiguration for object detection. In ECCV, 2018. 1, 2
      Google ScholarLocate open access versionFindings
    • T. Kong, F. Sun, A. Yao, H. Liu, M. Lu, and Y. Chen. RON: reverse connection with objectness prior networks for object detection. In CVPR, 201
      Google ScholarLocate open access versionFindings
    • H. Law and J. Deng. Cornernet: Detecting objects as paired keypoints. In ECCV, 208
      Google ScholarLocate open access versionFindings
    • C.-Y. Lee, S. Xie, P. Gallagher, Z. Zhang, and Z. Tu. Deeplysupervised nets. In AISTATS, 2015. 4
      Google ScholarLocate open access versionFindings
    • H. Li, P. Xiong, J. An, and L. Wang. Pyramid attention network for semantic segmentation. BMVC, 2018. 4
      Google ScholarLocate open access versionFindings
    • Z. Li, C. Peng, G. Yu, X. Zhang, Y. Deng, and J. Sun. Detnet: A backbone network for object detection. In ECCV, 2018. 1
      Google ScholarLocate open access versionFindings
    • T.-Y. Lin, P. Dollar, R. B. Girshick, K. He, B. Hariharan, and S. J. Belongie. Feature pyramid networks for object detection. In CVPR, 2017. 1, 2, 4
      Google ScholarLocate open access versionFindings
    • T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar. Focal loss for dense object detection. In ICCV, 2017. 1, 2, 3, 8
      Google ScholarLocate open access versionFindings
    • C. Liu, B. Zoph, J. Shlens, W. Hua, L.-J. Li, L. Fei-Fei, A. Yuille, J. Huang, and K. Murphy. Progressive neural architecture search. In ECCV, 2017. 2
      Google ScholarLocate open access versionFindings
    • S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia. Path aggregation network for instance segmentation. In CVPR, 2018. 1, 2
      Google ScholarLocate open access versionFindings
    • W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg. SSD: single shot multibox detector. In ECCV, 2016. 1
      Google ScholarLocate open access versionFindings
    • N. D. B. B. Md Amirul Islam, Mrigank Rochan and Y. Wang. Gated feedback refinement network for dense image labeling. CVPR, 2017. 2
      Google ScholarLocate open access versionFindings
    • A. Newell, K. Yang, and J. Deng. Stacked hourglass networks for human pose estimation. In ECCV, 2016. 2
      Google ScholarLocate open access versionFindings
    • E. Real, A. Aggarwal, Y. Huang, and Q. V. Le. Regularized evolution for image classifier architecture search. In AAAI, 2018. 2, 5
      Google ScholarLocate open access versionFindings
    • J. Redmon and A. Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. 8
      Findings
    • O. Ronneberger, P. Fischer, and T. Brox. U-Net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention, 2015. 2
      Google ScholarLocate open access versionFindings
    • M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.C. Chen. MobileNetV2: inverted residuals and linear bottl. CVPR, 2019. 1, 2, 7, 8
      Google ScholarLocate open access versionFindings
    • J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017. 5
      Findings
    • J.-Y. S. M.-C. K. S.-J. K. Seung-Wook Kim, HyongKeun Kook. Parallel feature pyramid network for object detection. ECCV, 2018. 1
      Google ScholarLocate open access versionFindings
    • C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Deep residual learning for image recognition. In CVPR, 2015. 1
      Google ScholarLocate open access versionFindings
    • M. Tan, B. Chen, R. Pang, V. Vasudevan, and Q. V. Le. Mnasnet: Platform-aware neural architecture search for mobile. arXiv preprint arXiv:1807.11626, 2018. 3, 8
      Findings
    • S. Teerapittayanon, B. McDanel, and H. Kung. Branchynet: Fast inference via early exiting from deep neural networks. In ICPR, pages 2464–2469. IEEE, 2016. 2
      Google ScholarLocate open access versionFindings
    • S. Woo, S. Hwang, and I. S. Kweon. StairNet: top-down semantic aggregation for accurate one shot detection. In WACV, 2018. 1
      Google ScholarLocate open access versionFindings
    • D. K. Yonghyun Kim, Bong-Nam Kang. San: Learning relationship between convolutional features for multi-scale object detection. ECCV, 2018. 1
      Google ScholarLocate open access versionFindings
    • F. Yu, D. Wang, E. Shelhamer, and T. Darrell. Deep layer aggregation. In CVPR, 2018. 1
      Google ScholarLocate open access versionFindings
    • S. Zhang, L. Wen, X. Bian, Z. Lei, and S. Z. Li. Single-shot refinement neural network for object detection. In CVPR, 2018. 1, 8
      Google ScholarLocate open access versionFindings
    • Q. Zhao, T. Sheng, Y. Wang, Z. Tang, Y. Chen, L. Cai, and H. Ling. M2det: A single-shot object detector based on multi-level feature pyramid network. AAAI, 2019. 2
      Google ScholarLocate open access versionFindings
    • P. Zhou, B. Ni, C. Geng, J. Hu, and Y. Xu. Scaletransferrable object detection. In CVPR, 2018. 1
      Google ScholarLocate open access versionFindings
    • B. Zoph and Q. V. Le. Neural architecture search with reinforcement learning. In ICLR, 2017. 2, 3, 4, 5
      Google ScholarLocate open access versionFindings
    • B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le. Learning transferable architectures for scalable image recognition. In CVPR, 2018. 2, 3, 4
      Google ScholarLocate open access versionFindings
    Your rating :
    0

     

    Tags
    Comments