Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

IEEE Trans. Pattern Anal. Mach. Intell., Volume 39, Issue 6, 2017, Pages 1137-1149.

被引用21451|引用|浏览471|来源
EI WOS
关键词
ProposalsObject detectionConvolutional codesFeature extractionSearch problems更多(2+)
微博一下
We have presented Region Proposal Networks for efficient and accurate region proposal generation

摘要

State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet [1] and Fast R-CNN [2] have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares...更多

代码

数据

0
简介
  • Recent advances in object detection are driven by the success of region proposal methods (e.g., [22]) and region-based convolutional neural networks (R-CNNs) [6].
  • Region-based CNNs were computationally expensive as originally developed in [6], their cost has been drastically reduced thanks to sharing convolutions across proposals [7, 5].
  • Selective Search (SS) [22], one of the most popular methods, greedily merges superpixels based on engineered low-level features.
  • The region proposal step still consumes as much running time as the detection network
重点内容
  • Recent advances in object detection are driven by the success of region proposal methods (e.g., [22]) and region-based convolutional neural networks (R-CNNs) [6]
  • We introduce novel Region Proposal Networks (RPNs) that share convolutional layers with state-of-the-art object detection networks [7, 5]
  • To unify Region Proposal Network with Fast region-based convolutional neural networks [5] object detection networks, we propose a simple training scheme that alternates between fine-tuning for the region proposal task and fine-tuning for object detection, while keeping the proposals fixed
  • We have presented Region Proposal Networks (RPNs) for efficient and accurate region proposal generation
  • Our method enables a unified, deep-learning-based object detection system to run at 5-17 fps
方法
  • The authors comprehensively evaluate the method on the PASCAL VOC 2007 detection benchmark [4].
  • This dataset consists of about 5k trainval images and 5k test images over 20 object categories.
  • The authors provide results in the PASCAL VOC 2012 benchmark for a few models.
  • The authors primarily evaluate detection mean Average Precision, because this is the actual metric for object detection.
结论
  • The authors have presented Region Proposal Networks (RPNs) for efficient and accurate region proposal generation.
  • By sharing convolutional features with the down-stream detection network, the region proposal step is nearly cost-free.
  • The authors' method enables a unified, deep-learning-based object detection system to run at 5-17 fps.
  • The learned RPN improves region proposal quality and the overall object detection accuracy
总结
  • Introduction:

    Recent advances in object detection are driven by the success of region proposal methods (e.g., [22]) and region-based convolutional neural networks (R-CNNs) [6].
  • Region-based CNNs were computationally expensive as originally developed in [6], their cost has been drastically reduced thanks to sharing convolutions across proposals [7, 5].
  • Selective Search (SS) [22], one of the most popular methods, greedily merges superpixels based on engineered low-level features.
  • The region proposal step still consumes as much running time as the detection network
  • Methods:

    The authors comprehensively evaluate the method on the PASCAL VOC 2007 detection benchmark [4].
  • This dataset consists of about 5k trainval images and 5k test images over 20 object categories.
  • The authors provide results in the PASCAL VOC 2012 benchmark for a few models.
  • The authors primarily evaluate detection mean Average Precision, because this is the actual metric for object detection.
  • Conclusion:

    The authors have presented Region Proposal Networks (RPNs) for efficient and accurate region proposal generation.
  • By sharing convolutional features with the down-stream detection network, the region proposal step is nearly cost-free.
  • The authors' method enables a unified, deep-learning-based object detection system to run at 5-17 fps.
  • The learned RPN improves region proposal quality and the overall object detection accuracy
表格
  • Table1: top) shows Fast R-CNN results when trained and tested using various region proposal methods. These results use the ZF net. For Selective Search (SS) [<a class="ref-link" id="c22" href="#r22">22</a>], we generate about 2k SS. Detection results on PASCAL VOC 2007 test set (trained on VOC 2007 trainval). The detectors are Fast R-CNN with ZF, but using various proposal methods for training and testing
  • Table2: Detection results on PASCAL VOC 2007 test set. The detector is Fast R-CNN and VGG-
  • Table3: Detection results on PASCAL VOC 2012 test set. The detector is Fast R-CNN and VGG-
  • Table4: Timing (ms) on a K40 GPU, except SS proposal is evaluated in a CPU. “Region-wise”
  • Table5: One-Stage Detection vs. Two-Stage Proposal + Detection. Detection results are on the
Download tables as Excel
相关工作
  • Several recent papers have proposed ways of using deep networks for locating class-specific or classagnostic bounding boxes [21, 18, 3, 20]. In the OverFeat method [18], a fully-connected (fc) layer is trained to predict the box coordinates for the localization task that assumes a single object. The fc layer is then turned into a conv layer for detecting multiple class-specific objects. The MultiBox methods [3, 20] generate region proposals from a network whose last fc layer simultaneously predicts multiple (e.g., 800) boxes, which are used for R-CNN [6] object detection. Their proposal network is applied on a single image or multiple large image crops (e.g., 224×224) [20]. We discuss OverFeat and MultiBox in more depth later in context with our method.
研究对象与分析
positive samples: 128
Instead, we randomly sample 256 anchors in an image to compute the loss function of a mini-batch, where the sampled positive and negative anchors have a ratio of up to 1:1. If there are fewer than 128 positive samples in an image, we pad the mini-batch with negative ones. We randomly initialize all new layers by drawing weights from a zero-mean Gaussian distribution with standard deviation 0.01

引用论文
  • N. Chavali, H. Agrawal, A. Mahendru, and D. Batra. Object-Proposal Evaluation Protocol is ’Gameable’. arXiv: 1505.05836, 2015.
    Findings
  • J. Dai, K. He, and J. Sun. Convolutional feature masking for joint object and stuff segmentation. In CVPR, 2015.
    Google ScholarLocate open access versionFindings
  • D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov. Scalable object detection using deep neural networks. In CVPR, 2014.
    Google ScholarLocate open access versionFindings
  • M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results, 2007.
    Google ScholarLocate open access versionFindings
  • R. Girshick. Fast R-CNN. arXiv:1504.08083, 2015.
    Findings
  • R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014.
    Google ScholarLocate open access versionFindings
  • K. He, X. Zhang, S. Ren, and J. Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. In ECCV. 2014.
    Google ScholarLocate open access versionFindings
  • J. Hosang, R. Benenson, P. Dollar, and B. Schiele. What makes for effective detection proposals? arXiv:1502.05082, 2015.
    Findings
  • J. Hosang, R. Benenson, and B. Schiele. How good are detection proposals, really? In BMVC, 2014.
    Google ScholarLocate open access versionFindings
  • Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv:1408.5093, 2014.
    Findings
  • A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
    Google ScholarLocate open access versionFindings
  • Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation applied to handwritten zip code recognition. Neural computation, 1989.
    Google ScholarFindings
  • K. Lenc and A. Vedaldi. R-CNN minus R. arXiv:1506.06981, 2015.
    Findings
  • J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In CVPR, 2015.
    Google ScholarLocate open access versionFindings
  • V. Nair and G. E. Hinton. Rectified linear units improve restricted boltzmann machines. In ICML, 2010.
    Google ScholarLocate open access versionFindings
  • S. Ren, K. He, R. Girshick, X. Zhang, and J. Sun. Object detection networks on convolutional feature maps. arXiv:1504.06066, 2015.
    Findings
  • O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. arXiv:1409.0575, 2014.
    Findings
  • P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun. Overfeat: Integrated recognition, localization and detection using convolutional networks. In ICLR, 2014.
    Google ScholarLocate open access versionFindings
  • K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
    Google ScholarLocate open access versionFindings
  • C. Szegedy, S. Reed, D. Erhan, and D. Anguelov. Scalable, high-quality object detection. arXiv:1412.1441v2, 2015.
    Findings
  • C. Szegedy, A. Toshev, and D. Erhan. Deep neural networks for object detection. In NIPS, 2013.
    Google ScholarLocate open access versionFindings
  • J. R. Uijlings, K. E. van de Sande, T. Gevers, and A. W. Smeulders. Selective search for object recognition. IJCV, 2013.
    Google ScholarLocate open access versionFindings
  • M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional neural networks. In ECCV, 2014.
    Google ScholarLocate open access versionFindings
  • C. L. Zitnick and P. Dollar. Edge boxes: Locating object proposals from edges. In ECCV, 2014.
    Google ScholarLocate open access versionFindings
您的评分 :
0

 

标签
评论