D2Det: Towards High Quality Object Detection and Instance Segmentation

CVPR, pp. 11482-11491, 2020.

Cited by: 0|Bibtex|Views63|Links
EI
Keywords:
instance segmentationgrid r cnnbox offsetregion proposal networkHybrid Task CascadeMore(14+)
Weibo:
We introduce dense local regression that predicts multiple dense box offsets for a proposal

Abstract:

We propose a novel two-stage detection method, D2Det, that collectively addresses both precise localization and accurate classification. For precise localization, we introduce a dense local regression that predicts multiple dense box offsets for an object proposal. Different from traditional regression and keypoint-based localization empl...More
0
Introduction
  • Recent years have witnessed formidable progress in object detection thanks to the advances in deep neural networks.
  • The regression module utilizes several fully connected layers to predict a single box offset of the candidate proposal.
  • Grid R-CNN [36] extends Faster R-CNN by separating the classification and regression into two branches, as opposed to a shared network.
  • Instead of the regression utilized in Faster R-CNN, Grid R-CNN introduces a localization scheme, based on a fully convolutional network, that searches for a set of keypoints in a fixed-sized region to identify an object boundary
Highlights
  • Recent years have witnessed formidable progress in object detection thanks to the advances in deep neural networks
  • Contributions: We propose a two-stage object detection approach, D2Det, that targets both precise localization and accurate classification
  • To further improve our dense local regression, we introduce a binary overlap prediction that identifies each sub-region of a candidate proposal as an object region or background region, thereby reducing the influence of background region
  • For accurate classification of the target object, we introduce a discriminative RoI pooling that first samples features from various sub-regions and performs an adaptive weighted pooling that aims to generate discriminative features
  • We propose a two-stage detection method that addresses both precise object localization and accurate classification
  • We introduce dense local regression that predicts multiple dense box offsets for a proposal
Methods
  • Single-Stage Methods: RetinaNet w FPN [32] ResNet101.
  • ConRetinaNet w FPN [25] EFGRNet [38] CornerNet [26] Hourglass104 FSAF w FPN [53] RPDet w FPN [50] FCOS w FPN [45].
  • Mask R-CNN [19] PANet [34] D2Det (Ours).
  • Our D2Det achieves superior results compared to existing works reported on this dataset [51].
  • Fig. 6 shows qualitative results on MS COCO test-dev and iSAID test set
Conclusion
  • The authors propose a two-stage detection method that addresses both precise object localization and accurate classification.
  • The authors introduce dense local regression that predicts multiple dense box offsets for a proposal.
  • A discriminative RoI pooling scheme is proposed which performs adaptive weighting to enhance discriminative features.
  • Our D2Det achieves state-of-the-art detection results on MS COCO and UAVDT.
  • The authors present results for instance segmentation on MS COCO and iSAID, achieving promising results compared to existing methods
Summary
  • Introduction:

    Recent years have witnessed formidable progress in object detection thanks to the advances in deep neural networks.
  • The regression module utilizes several fully connected layers to predict a single box offset of the candidate proposal.
  • Grid R-CNN [36] extends Faster R-CNN by separating the classification and regression into two branches, as opposed to a shared network.
  • Instead of the regression utilized in Faster R-CNN, Grid R-CNN introduces a localization scheme, based on a fully convolutional network, that searches for a set of keypoints in a fixed-sized region to identify an object boundary
  • Methods:

    Single-Stage Methods: RetinaNet w FPN [32] ResNet101.
  • ConRetinaNet w FPN [25] EFGRNet [38] CornerNet [26] Hourglass104 FSAF w FPN [53] RPDet w FPN [50] FCOS w FPN [45].
  • Mask R-CNN [19] PANet [34] D2Det (Ours).
  • Our D2Det achieves superior results compared to existing works reported on this dataset [51].
  • Fig. 6 shows qualitative results on MS COCO test-dev and iSAID test set
  • Conclusion:

    The authors propose a two-stage detection method that addresses both precise object localization and accurate classification.
  • The authors introduce dense local regression that predicts multiple dense box offsets for a proposal.
  • A discriminative RoI pooling scheme is proposed which performs adaptive weighting to enhance discriminative features.
  • Our D2Det achieves state-of-the-art detection results on MS COCO and UAVDT.
  • The authors present results for instance segmentation on MS COCO and iSAID, achieving promising results compared to existing methods
Tables
  • Table1: State-of-the-art object detection comparison (in terms of AP) on MS COCO test-dev. When using a ResNet101 backbone with FPN, our D2Det achieves the best single-model performance, with an overall AP of 45.4, surpassing all existing two-stage methods employing the same backbone with FPN (TridentNet and Auto-FPN do not use FPN since they introduce alternative approaches). Further, our D2Det outperforms DCN v2 [<a class="ref-link" id="c54" href="#r54">54</a>] by a gain of 3.4%, when using the same ResNet101-deform v2 backbone. In case of multi-scale training and inference, our D2Det* achieves an overall AP of 50.1
  • Table2: Impact of integrating our dense local regression
  • Table3: Comparisons of our dense local regression (DLR)
  • Table4: Object detection performance comparison on
  • Table5: State-of-the-art instance segmentation comparison (with a single model performance) in Mask AP on MS
  • Table6: State-of-the-art instance segmentation comparison in Mask AP on iSAID test set
Download tables as Excel
Related work
  • In recent years, two-stage detection approaches [18, 43, 17, 44, 28, 36, 5, 46] have shown continuous performance improvements in terms of detection accuracy on standard benchmarks [33, 13]. Among existing two-stage detectors, Faster R-CNN [43] is one of the most popular frameworks for object detection. In the first stage, Faster R-CNN utilizes a region proposal network (RPN) to generate classagnostic region proposals. The second stage, also known as Fast R-CNN [17], extracts a fixed-sized region-of-interest (RoI) feature representation followed by the computation of classification scores and regressed bounding-box coordi- COCO AP COCO AP
Funding
  • This work was supported by National Natural Science Foundation of China (Nos. 61906131, 61632018), Postdoctoral Program for Innovative Talents (No BX20180214), and China Postdoctoral Science Foundation (No 2018M641647)
Reference
  • Navaneeth Bodla, Bharat Singh, Rama Chellappa, and Larry S. Davis. Soft-nms – improving object detection with one line of code. Proc. IEEE International Conf. Computer Vision, 2017. 6
    Google ScholarLocate open access versionFindings
  • Zhaowei Cai and Nuno Vasconcelos. Cascade R-CNN: Delving into high quality object detection. Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2018. 2, 6
    Google ScholarLocate open access versionFindings
  • Jiale Cao, Yanwei Pang, Jungong Han, and Xuelong Li. Hierarchical shot detector. Proc. IEEE International Conf. Computer Vision, 2019. 6
    Google ScholarLocate open access versionFindings
  • Jiale Cao, Yanwei Pang, and Xuelong Li. Triply supervised decoder networks for joint detection and segmentation. Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2019. 1
    Google ScholarLocate open access versionFindings
  • Jiale Cao, Yanwei Pang, Shengjie Zhao, and Xuelong Li. High-level semantic networks for multi-scale object detection. IEEE Trans. on Circuits and Systems for Video Technology, 2020. 2
    Google ScholarLocate open access versionFindings
  • Kai Chen, Jiangmiao Pang, Jiaqi Wang, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jianping Shi, Wanli Ouyang, Chen Change Loy, and Dahua Lin. Hybrid task cascade for instance segmentation. Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2019. 2, 8
    Google ScholarLocate open access versionFindings
  • Liang-Chieh Chen, Alexander Hermans, George Papandreou, Florian Schroff, Peng Wang, and Hartwig Adam. Masklab: Instance segmentation by refining object detection with semantic and direction features. Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2018. 8
    Google ScholarLocate open access versionFindings
  • Bowen Cheng, Yunchao Wei, Honghui Shi, Rogerio Schmidt Feris, Jinjun Xiong, and Thomas S. Huang. Revisiting rcnn: on awakening the classification power of faster rcnn. Proc. European Conf. Computer Vision, 2011
    Google ScholarLocate open access versionFindings
  • Jifeng Dai, Kaiming He, and Jian Sun. Instance-aware semantic segmentation via multi-task network cascades. Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2016. 8
    Google ScholarLocate open access versionFindings
  • Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. Deformable convolutional networks. Proc. IEEE International Conf. Computer Vision, 2017. 3, 5
    Google ScholarLocate open access versionFindings
  • Dawei Du, Yuankai Qi, Hongyang Yu, Yifan Yang, Kaiwen Duan, Guorong Li, Weigang Zhang, Qingming Huang, and Qi Tian. The unmanned aerial vehicle benchmark: Object detection and tracking. In Proc. European Conf. Computer Vision, 2018. 2, 5, 8
    Google ScholarLocate open access versionFindings
  • Kaiwen Duan, Song Bai, Lingxi Xie, Honggang Qi, Qingming Huang, and Qi Tian. Centernet: Keypoint triplets for object detection. Proc. IEEE International Conf. Computer Vision, 2019. 2
    Google ScholarLocate open access versionFindings
  • Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2):303–338, 2010. 2
    Google ScholarLocate open access versionFindings
  • Ziteng Gao, Limin Wang, and Gangshan Wu. Lip: Local importance-based pooling. Proc. IEEE International Conf. Computer Vision, 2019. 5, 6
    Google ScholarLocate open access versionFindings
  • Golnaz Ghiasi, Tsung-Yi Lin, and Quoc V. Le. Nas-fpn: Learning scalable feature pyramid architecture for object detection. Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2019. 1
    Google ScholarLocate open access versionFindings
  • Spyros Gidaris and Nikos Komodakis. Attend refine repeat: Active box proposal generation via in-out localization. Proc. British Machine Vision Conference, 202
    Google ScholarLocate open access versionFindings
  • Ross Girshick. Fast R-CNN. Proc. IEEE International Conf. Computer Vision, 2015. 1, 2, 3, 4
    Google ScholarLocate open access versionFindings
  • Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2014. 1, 2
    Google ScholarLocate open access versionFindings
  • Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross Girshick. Mask R-CNN. Proc. IEEE International Conf. Computer Vision, 2017. 2, 3, 4, 8
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. Proc. IEEE International Conf. Computer Vision, 2016. 5
    Google ScholarLocate open access versionFindings
  • Yihui He, Chenchen Zhu, Jianren Wang, Marios Savvides, and Xiangyu Zhang. Bounding box regression with uncertainty for accurate object detection. Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2019. 1
    Google ScholarLocate open access versionFindings
  • Lichao Huang, Yi Yang, Yafeng Deng, and Yinan Yu. Densebox: Unifying landmark localization with end to end object detection. arXiv:1509.04874, 2015. 2
    Findings
  • Zhaojin Huang, Lichao Huang, Yongchao Gong, Chang Huang, and Xinggang Wang. Mask scoring R-CNN. Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2019. 2, 8
    Google ScholarLocate open access versionFindings
  • Borui Jiang, Ruixuan Luo, Jiayuan Mao, Tete Xiao, and Yuning Jiang. Acquisition of localization confidence for accurate object detection. Proc. European Conf. Computer Vision, 2018. 2
    Google ScholarLocate open access versionFindings
  • Tao Kong, Fuchun Sun, Huaping Liu, Yuning Jiang, and Jianbo Shi. Consistent optimization for single-shot object detection. arXiv:1901.06563, 2019. 6
    Findings
  • Hei Law and Jia Deng. Cornernet: Detecting objects as paired keypoints. Proc. European Conf. Computer Vision, 2018. 2, 6
    Google ScholarLocate open access versionFindings
  • Shuai Li, Lingxiao Yang, Jianqiang Huang, Xian-Sheng Hua, and Lei Zhang. Dynamic anchor feature selection for single-shot object detection. Proc. IEEE International Conf. Computer Vision, 2019. 1
    Google ScholarLocate open access versionFindings
  • Yanghao Li, Yuntao Chen, Naiyan Wang, and Zhaoxiang Zhang. Scale-aware trident networks for object detection. Proc. IEEE International Conf. Computer Vision, 2019. 2, 6
    Google ScholarLocate open access versionFindings
  • Yazhao Li, Yanwei Pang, Jianbing Shen, Jiale Cao, and Ling Shao. Netnet: Neighbor erasing and transferring network for bettersingle shot object detection. Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2020. 1
    Google ScholarLocate open access versionFindings
  • Yi Li, Haozhi Qi, Jifeng Dai, Xiangyang Ji, and Yichen Wei. Fully convolutional instance-aware semantic segmentation. Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2017. 8
    Google ScholarLocate open access versionFindings
  • Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature pyramid networks for object detection. Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2017. 1, 2, 5, 6, 8
    Google ScholarLocate open access versionFindings
  • Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollar. Focal loss for dense object detection. Proc. IEEE International Conf. Computer Vision, 2017. 6, 8
    Google ScholarLocate open access versionFindings
  • Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C. Lawrence Zitnick. Microsoft COCO: Common objects in context. Proc. European Conf. Computer Vision, 2014. 2, 5, 6, 7
    Google ScholarLocate open access versionFindings
  • Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. Path aggregation network for instance segmentation. Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2018. 2, 8
    Google ScholarLocate open access versionFindings
  • Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. Ssd: Single shot multibox detector. Proc. European Conf. Computer Vision, 2016. 1
    Google ScholarLocate open access versionFindings
  • Xin Lu, Buyu Li, Yuxin Yue, Quanquan Li, and Junjie Yan. Grid R-CNN. Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2019. 1, 2, 3, 5, 6, 7, 8
    Google ScholarLocate open access versionFindings
  • Xin Lu, Buyu Li, Yuxin Yue, Quanquan Li, and Junjie Yan. Grid R-CNN plus: Faster and better. arXiv:1906.05688, 2019. 2, 5, 6, 7, 8
    Findings
  • Jing Nie, Rao Muhammad Anwer, Hisham Cholakkal, Fahad Shahbaz Khan, Yanwei Pang, and Ling Shao. Enriched feature guided refinement network for object detection. In The IEEE International Conference on Computer Vision, 2019. 6
    Google ScholarLocate open access versionFindings
  • Jiangmiao Pang, Kai Chen, Jianping Shi, Huajun Feng, Wanli Ouyang, and Dahua Lin. Libra R-CNN: Towards balanced learning for object detection. Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2019. 2, 6
    Google ScholarLocate open access versionFindings
  • Yanwei Pang, Tiancai Wang, R. M. Anwer, F. S. Khan, and L. Shao. Efficient featurized image pyramid network for single shot detector. Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2019. 1
    Google ScholarLocate open access versionFindings
  • Junran Peng, Ming Sun, Zhaoxiang Zhang, Tieniu Tan, and Junjie Yan. Pod: Practical object detection with scalesensitive network. Proc. IEEE International Conf. Computer Vision, 2019. 1
    Google ScholarLocate open access versionFindings
  • Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2016. 1
    Google ScholarLocate open access versionFindings
  • Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. Proc. Advances in Neural Information Processing Systems, 2015. 1, 2, 3, 5
    Google ScholarLocate open access versionFindings
  • Bharat Singh, Mahyar Najibi, and Larry S. Davis. SNIPER: Efficient multi-scale training. Proc. Advances in Neural Information Processing Systems, 2018. 2
    Google ScholarLocate open access versionFindings
  • Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. Fcos: Fully convolutional one-stage object detection. Proc. IEEE International Conf. Computer Vision, 2019. 6
    Google ScholarLocate open access versionFindings
  • Jiaqi Wang, Kai Chen, Shuo Yang, Chen Change Loy, and Dahua Lin. Region proposal by guided anchoring. Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2019. 2
    Google ScholarLocate open access versionFindings
  • Tiancai Wang, Rao Muhammad Anwer, Hisham Cholakkal, Fahad Shahbaz Khan, Yanwei Pang, and Ling Shao. Learning rich features at high-speed for single-shot object detection. Proc. IEEE International Conf. Computer Vision, 2019. 8
    Google ScholarLocate open access versionFindings
  • Zhenyu Wu, Karthik Suresh, Priya Narayanan, Hongyu Xu, Heesung Kwon, and Zhangyang Wang. Delving into robust object detection from unmanned aerial vehicles: A deep nuisance disentanglement approach. In Proc. IEEE International Conf. Computer Vision, 2019. 5, 8
    Google ScholarLocate open access versionFindings
  • Hang Xu, Lewei Yao, Wei Zhang, Xiaodan Liang, and Zhenguo Li. Auto-fpn: Automatic network architecture adaptation for object detection beyond classification. Proc. IEEE International Conf. Computer Vision, 2019. 6
    Google ScholarLocate open access versionFindings
  • Ze Yang, Shaohui Liu, Han Hu, Liwei Wang, and Stephen Lin. Reppoints: Point set representation for object detection. Proc. IEEE International Conf. Computer Vision, 2019. 2, 6
    Google ScholarLocate open access versionFindings
  • Syed Waqas Zamir, Aditya Arora, Akshita Gupta, Salman Khan, Guolei Sun, Fahad Shahbaz Khan, Fan Zhu, Ling Shao, Gui-Song Xia, and Xiang Bai. isaid: A large-scale dataset for instance segmentation in aerial images. In Proc. IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019. 2, 8
    Google ScholarLocate open access versionFindings
  • Xingyi Zhou, Jiacheng Zhuo, and Philipp Krahenbuhl. Bottom-up object detection by grouping extreme and center points. Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2019. 2
    Google ScholarLocate open access versionFindings
  • Chenchen Zhu, Yihui He, and Marios Savvides. Feature selective anchor-free module for single-shot object detection. Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2019. 6
    Google ScholarLocate open access versionFindings
  • Xizhou Zhu, Han Hu, Stephen Lin, and Jifeng Dai. Deformable convnets v2: More deformable, better results. Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2019. 6
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments