Modeling Long- And Short-Term Temporal Context For Video Object Detection

2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP)(2019)

引用 13|浏览14
暂无评分
摘要
Video object detection enhances the performance of the still-image based object detection by exploiting temporal context information from neighboring frames. Most of the state-of-the-art video object detectors are non-causal and require lots of preceding and succeeding frames, which makes them impractical for real-time online detection where succeeding frames are not available. In this paper, we propose a causal recurrent flow-based method for online video object detection. The proposed method reads only the current frame and one preceding frame from memory buffer at each time step. Two types of temporal context information are utilized. The short-term temporal context information is utilized by warping the feature map from the nearby preceding frame based on the optical flow. The long-term temporal context information is extracted from the temporal convolutional LSTM, where informative features from distant preceding frames are stored and propagated through the time. By aggregating both long- and short-term temporal context information, our proposed method achieves competitive performance (75.5% mAP) on ImageNet VID dataset at a relatively high speed.
更多
查看译文
关键词
Video object detection, deep learning, optical flow, LSTM, context modeling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要