Beyond Skip Connections: Top-Down Modulation for Object Detection.

arXiv: Computer Vision and Pattern Recognition(2016)

引用 389|浏览187
暂无评分
摘要
In recent years, we have seen tremendous progress in the field of object detection. Most of the recent improvements have been achieved by targeting deeper feedforward networks. However, many hard object categories, such as bottle and remote, require representation of fine details and not coarse, semantic representations. But most of these fine details are lost in the early convolutional layers. What we need is a way to incorporate finer details from lower layers into the detection architecture. Skip connections have been proposed to combine high-level and low-level features, but we argue that selecting the right features from low-level requires top-down contextual information. Inspired by the human visual pathway, in this paper we propose top-down modulations as a way to incorporate fine details into the detection framework. Our approach supplements the standard bottom-up, feedforward ConvNet with a top-down modulation (TDM) network, connected using lateral connections. These connections are responsible for the modulation of lower layer filters, and the top-down network handles the selection and integration of features. The proposed architecture provides a significant boost on the COCO benchmark for VGG16, ResNet101, and InceptionResNet-v2 architectures. Preliminary experiments using InceptionResNet-v2 achieve 36.8 AP, which is the best performance to-date on the COCO benchmark using a single-model without any bells and whistles (e.g., multi-scale, iterative box refinement, etc.).
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要