Corner Proposal Network for Anchor-free, Two-stage Object Detection

Cited by: 0|Bibtex|Views177|Links
Keywords:
average precisioncorner proposalreal time object detectionproposal networkdeep convolutionalMore(15+)
Weibo:
We present an anchor-free, two-stage object detection framework

Abstract:

The goal of object detection is to determine the class and location of objects in an image. This paper proposes a novel anchor-free, two-stage framework which first extracts a number of object proposals by finding potential corner keypoint combinations and then assigns a class label to each proposal by a standalone classification stage....More
0
Introduction
  • Powered by the rapid development of deep learning [21], in particular deep convolutional neural networks [18,35,13], researchers have designed effective algorithms for object detection [11]
  • This is a challenging task since objects can appear in any scale, shape, and position in a natural image, yet the appearance of objects of the same class can be very different.
Highlights
  • Powered by the rapid development of deep learning [21], in particular deep convolutional neural networks [18,35,13], researchers have designed effective algorithms for object detection [11]
  • This paper provides an alternative opinion on the design of object detection approaches
  • We report the corresponding results of Corner Proposal Network (CPN), the method proposed in this paper, which demonstrates that CPN inherits the merits of CenterNet and CornerNet and has better flexibility of locating objects, especially with peculiar shapes
  • We present an anchor-free, two-stage object detection framework
  • With the above two stages, the recall and precision of detection are significantly improved, and the final result ranks among the top of existing object detection methods
  • The most important take-away is that anchor-free methods are more flexible in proposal extraction, while an individual discrimination stage is required to improve precision. When implemented properly, such a two-stage framework can be efficient in evaluation
Methods
  • CPN HG-104 68.8 88.2 93.7 95.8 99.1 54.4 50.6 46.2 35.4 approach, does not report a much higher recall
  • This is not expected because large objects should be easier to detect, as the other three anchor-free methods suggest.
  • AP50 AP75 APS APM APL AR100
  • Note that these three methods share a similar way of extracting corner keypoints, but CornerNet suffers large AF values due to the lack of validation beyond the proposals.
Results
  • HG-104 30.6 26.9 29.7 35.5 48.8 25.7 19.2 information, so that combining them can further improve the AP by more than 1%.
Conclusion
  • The authors present an anchor-free, two-stage object detection framework.
  • The most important take-away is that anchor-free methods are more flexible in proposal extraction, while an individual discrimination stage is required to improve precision.
  • When implemented properly, such a two-stage framework can be efficient in evaluation.
  • The debate on using one-stage or two-stage detectors seems not critical
Summary
  • Introduction:

    Powered by the rapid development of deep learning [21], in particular deep convolutional neural networks [18,35,13], researchers have designed effective algorithms for object detection [11]
  • This is a challenging task since objects can appear in any scale, shape, and position in a natural image, yet the appearance of objects of the same class can be very different.
  • Methods:

    CPN HG-104 68.8 88.2 93.7 95.8 99.1 54.4 50.6 46.2 35.4 approach, does not report a much higher recall
  • This is not expected because large objects should be easier to detect, as the other three anchor-free methods suggest.
  • AP50 AP75 APS APM APL AR100
  • Note that these three methods share a similar way of extracting corner keypoints, but CornerNet suffers large AF values due to the lack of validation beyond the proposals.
  • Results:

    HG-104 30.6 26.9 29.7 35.5 48.8 25.7 19.2 information, so that combining them can further improve the AP by more than 1%.
  • Conclusion:

    The authors present an anchor-free, two-stage object detection framework.
  • The most important take-away is that anchor-free methods are more flexible in proposal extraction, while an individual discrimination stage is required to improve precision.
  • When implemented properly, such a two-stage framework can be efficient in evaluation.
  • The debate on using one-stage or two-stage detectors seems not critical
Tables
  • Table1: Comparison among the average recall (AR) of anchor-based and anchor-free detection methods. Here, the average recall is recorded for targets of different aspect ratios and different sizes. To explore the limit of the average recall for each method, we exclude the impacts of bounding-box categories and sorts on recall, and compute it by allowing at most 1000 object proposals. AR1+, AR2+, AR3+ and AR4+ denote box area in the ranges of 962, 2002 , 2002, 3002 , 3002, 4002 , and 4002, +∞ , respectively. ‘X’ and ‘HG’ stand for ResNeXt and Hourglass, respectively
  • Table2: Anchor-free detection methods such as CornerNet and CenterNet suffer a large number of false positives and can benefit from incorporating richer semantics for judgment. Here, APoriginal, APrefined, and APcorrect indicate the AP of the original output, after non-object proposals are removed, and after the correct label is assigned to each survived proposal. Both APrefined and APcorrect require ground-truth labels
  • Table3: Inference accuracy (%) of CPN and state-of-the-art detectors on the COCO test-dev set. CPN ranks among the top of state-of-the-art detectors. ‘R’, ‘X’, ‘HG’, ‘DCN’ and ‘†’ denote ResNet, ResNeXt, Hourglass, Deformable Convolution Network [<a class="ref-link" id="c7" href="#r7">7</a>], and multi-scale training or testing, respectively
  • Table4: The detection performance (%) of different classification options on CPN
  • Table5: We report the average false discovery rates (%, lower is better) for CornerNet, CenterNet and CPN on the MS-COCO validation dataset. The results show that our approach generates fewer false positives. Under the same corner keypoint extractor, this is the key to outperform the baselines in the AP metrics
  • Table6: The detection performance (%) of using different ways (instance embedding and binary classification) to determine the validity of a proposal
  • Table7: Inference speed of CPN under different conditions vs. other detectors on the MS-COCO validation dataset. FPS is measured on the on an NVIDIA Tesla-V100 GPU. CPN achieves a good trade-off between accuracy and speed
Download tables as Excel
Related work
  • Object detection is an important yet challenging problem in computer vision. It aims to obtain a tight bounding-box as well as a class label for each object in an image. In recent years, with the rapid development of deep learning, most powerful object detection methods are based on training deep neural networks [11,10]. According to the way of determining the geometry and class of an object, existing detection approaches can be roughly categorized into anchor-based and anchor-free methods.

    An anchor-based approach starts with placing a large number of anchors, which are regional proposals with different but fixed scales and shapes, and are uniformly distributed on the image plane. These anchors are then considered as object proposals and an individual classifier is trained to determine the objectness as well as the class of each proposal [34]. Beyond this framework, researchers made efforts in two aspects, namely, improving the basic quality of regional features extracted from the proposal, and arriving at a better alignment between the proposals and features. For the first type of efforts, typical examples include using more powerful network backbones [13,36,14] and using hierarchical features to represent a region [23,34,27]. Regarding the second type, there exist methods to align anchors to features [46,47], align features to anchors [7,5], and adjust the anchors after classification has been done [34,27,3].
Funding
  • HG-104 30.6 26.9 29.7 35.5 48.8 25.7 19.2 information, so that combining them can further improve the AP by more than 1%
Study subjects and analysis
samples: 6
which we use an Adam [16] optimizer to train our model. On eight NVIDIA Tesla-V100 (32GB) GPUs, we use a batch size of 48 (6 samples on each card) and train the model for 200K iterations with a base learning rate of 2.5 × 10−4 followed by another 50K iterations with a reduced learning rate of 2.5×10−5. The training lasts about 9 days, 5 days and 3 days for Hourglass-104, Hourglass-52 and DLA-34, respectively

Reference
  • Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) 11
    Findings
  • Bodla, N., Singh, B., Chellappa, R., Davis, L.S.: Soft-nms–improving object detection with one line of code. In: Proceedings of the IEEE international conference on computer vision. pp. 5561–5569 (2017) 10
    Google ScholarLocate open access versionFindings
  • Cai, Z., Vasconcelos, N.: Cascade r-cnn: Delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 6154–6162 (2018) 1, 3, 4, 5, 11
    Google ScholarLocate open access versionFindings
  • Chen, K., Wang, J., Pang, J., Cao, Y., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Xu, J., Zhang, Z., Cheng, D., Zhu, C., Cheng, T., Zhao, Q., Li, B., Lu, X., Zhu, R., Wu, Y., Dai, J., Wang, J., Shi, J., Ouyang, W., Loy, C.C., Lin, D.: MMDetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019) 10
    Findings
  • Chen, Y., Han, C., Wang, N., Zhang, Z.: Revisiting feature alignment for one-stage object detection. arXiv preprint arXiv:1908.01570 (2019) 3, 4, 11, 12
    Findings
  • Dai, J., Li, Y., He, K., Sun, J.: R-fcn: Object detection via region-based fully convolutional networks. In: Advances in neural information processing systems. pp. 379–387 (2016) 1, 4
    Google ScholarFindings
  • Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Proceedings of the IEEE international conference on computer vision. pp. 764–773 (2017) 3, 11
    Google ScholarLocate open access versionFindings
  • Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: Centernet: Keypoint triplets for object detection. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 6569–6578 (2019) 2, 3, 5, 6, 7, 10, 11, 13, 14, 15
    Google ScholarLocate open access versionFindings
  • Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. International journal of computer vision 88(2), 303–338 (2010) 10
    Google ScholarLocate open access versionFindings
  • Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision. pp. 1440–1448 (2015) 1, 3
    Google ScholarLocate open access versionFindings
  • Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 580–587 (2014) 1, 3
    Google ScholarLocate open access versionFindings
  • He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision. pp. 2961–2969 (2017) 4, 8
    Google ScholarLocate open access versionFindings
  • He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016) 1, 3
    Google ScholarLocate open access versionFindings
  • Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4700–4708 (2017) 3
    Google ScholarLocate open access versionFindings
  • Huang, L., Yang, Y., Deng, Y., Yu, Y.: Densebox: Unifying landmark localization with end to end object detection. arXiv preprint arXiv:1509.04874 (2015) 3
    Findings
  • Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. Computer science (2014) 10
    Google ScholarLocate open access versionFindings
  • Kong, T., Sun, F., Liu, H., Jiang, Y., Li, L., Shi, J.: Foveabox: Beyound anchorbased object detection. IEEE Transactions on Image Processing (2020) 2, 3, 11, 12
    Google ScholarLocate open access versionFindings
  • Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems. pp. 1097–1105 (2012) 1
    Google ScholarFindings
  • Law, H., Deng, J.: Cornernet: Detecting objects as paired keypoints. In: European conference on computer vision. pp. 734–750 (2018) 2, 3, 5, 6, 10, 11, 13, 14, 15
    Google ScholarFindings
  • Law, H., Teng, Y., Russakovsky, O., Deng, J.: Cornernet-lite: Efficient keypoint based object detection. arXiv preprint arXiv:1904.08900 (2019) 15
    Findings
  • LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015) 1
    Google ScholarLocate open access versionFindings
  • Li, Y., Chen, Y., Wang, N., Zhang, Z.: Scale-aware trident networks for object detection. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 6054–6063 (2019) 4, 11
    Google ScholarLocate open access versionFindings
  • Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2117–2125 (2017) 3, 4, 9, 11
    Google ScholarLocate open access versionFindings
  • Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision. pp. 2980–2988 (2017) 1, 3, 4, 8, 9, 11
    Google ScholarLocate open access versionFindings
  • Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollar, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: European conference on computer vision. pp. 740–755 (2014) 3, 9
    Google ScholarLocate open access versionFindings
  • Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 8759–8768 (2018) 11, 12
    Google ScholarLocate open access versionFindings
  • Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: Single shot multibox detector. In: European conference on computer vision. pp. 21–37 (2016) 1, 3, 4, 5
    Google ScholarLocate open access versionFindings
  • Lu, X., Li, B., Yue, Y., Li, Q., Yan, J.: Grid r-cnn. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 7363–7372 (2019) 11
    Google ScholarLocate open access versionFindings
  • Newell, A., Huang, Z., Deng, J.: Associative embedding: End-to-end learning for joint detection and grouping. In: Advances in neural information processing systems. pp. 2277–2287 (2017) 3
    Google ScholarFindings
  • Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: European conference on computer vision. pp. 483–499 (2016) 3, 10
    Google ScholarLocate open access versionFindings
  • Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., Lin, D.: Libra r-cnn: Towards balanced learning for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 821–830 (2019) 4, 5, 11
    Google ScholarLocate open access versionFindings
  • Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) 10
    Google ScholarFindings
  • Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 779–788 (2016) 4
    Google ScholarLocate open access versionFindings
  • Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems. pp. 91–99 (2015) 1, 2, 3, 4, 5, 15
    Google ScholarFindings
  • Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) 1
    Findings
  • Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 5693–5703 (2019) 3
    Google ScholarLocate open access versionFindings
  • Tan, M., Le, Q.V.: Efficientnet: Rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946 (2019) 11
    Findings
  • Tan, M., Pang, R., Le, Q.V.: Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 10781–10790 (2020) 11
    Google ScholarLocate open access versionFindings
  • Tian, Z., Shen, C., Chen, H., He, T.: Fcos: Fully convolutional one-stage object detection. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 9627–9636 (2019) 2, 3, 5, 11, 15
    Google ScholarLocate open access versionFindings
  • Tychsen-Smith, L., Petersson, L.: Denet: Scalable real-time object detection with directed sparse sampling. In: Proceedings of the IEEE international conference on computer vision. pp. 428–436 (2017) 8
    Google ScholarLocate open access versionFindings
  • Wang, J., Chen, K., Yang, S., Loy, C.C., Lin, D.: Region proposal by guided anchoring. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 2965–2974 (2019) 11
    Google ScholarLocate open access versionFindings
  • Yang, Z., Liu, S., Hu, H., Wang, L., Lin, S.: Reppoints: Point set representation for object detection. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 9657–9666 (2019) 2, 11
    Google ScholarLocate open access versionFindings
  • Yu, F., Wang, D., Shelhamer, E., Darrell, T.: Deep layer aggregation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2403–2412 (2018) 3, 10
    Google ScholarLocate open access versionFindings
  • Yu, J., Jiang, Y., Wang, Z., Cao, Z., Huang, T.: Unitbox: An advanced object detection network. In: Proceedings of the 24th ACM international conference on Multimedia. pp. 516–520 (2016) 3
    Google ScholarLocate open access versionFindings
  • Zhang, S., Chi, C., Yao, Y., Lei, Z., Li, S.Z.: Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 9759–9768 (2020) 11
    Google ScholarLocate open access versionFindings
  • Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Single-shot refinement neural network for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4203–4212 (2018) 1, 3, 4
    Google ScholarLocate open access versionFindings
  • Zhang, X., Wan, F., Liu, C., Ji, R., Ye, Q.: Freeanchor: Learning to match anchors for visual object detection. In: Advances in Neural Information Processing Systems. pp. 147–155 (2019) 3, 11, 12
    Google ScholarLocate open access versionFindings
  • Zhou, X., Wang, D., Krahenbuhl, P.: Objects as points. arXiv preprint arXiv:1904.07850 (2019) 3, 10, 11
    Findings
  • Zhou, X., Zhuo, J., Krahenbuhl, P.: Bottom-up object detection by grouping extreme and center points. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 850–859 (2019) 3, 11, 12
    Google ScholarLocate open access versionFindings
  • Zhu, C., Chen, F., Shen, Z., Savvides, M.: Soft anchor-point object detection. arXiv preprint arXiv:1911.12448 (2019) 2, 3, 11
    Findings
  • Zhu, C., He, Y., Savvides, M.: Feature selective anchor-free module for single-shot object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 840–849 (2019) 11
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments