Semantic R-CNN for Natural Language Object Detection.

ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT II(2017)

引用 1|浏览343
暂无评分
摘要
In this paper, we present a simple and effective framework for natural language object detection, to localize a target within an image based on description of the target. The method, called semantic R-CNN, extends RPN (Region Proposal Network) [1] by adding LSTM [20] module for processing natural language query text. LSTM [20] module take encoded query text and image descriptors as input and output the probability of the query text conditioned on visual features of candidate box and whole image. Those candidate boxes are generated by RPN and their local features are extracted by ROI pooling. RPN can be initialized from pre-trained Faster R-CNN model [1], transfers object visual knowledge from traditional object detection domain to our task. Experimental results demonstrate that our method significantly outperform previous baseline SCRC (Spatial Context Recurrent ConvNet) [7] model on Referit dataset [8], moreover, our model is simple to train similar to Faster R-CNN.
更多
查看译文
关键词
Object detection,Natural language,RPN
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要