Flow-Guided Feature Aggregation for Video Object Detection

2017 IEEE International Conference on Computer Vision (ICCV)(2017)

引用 745|浏览347
暂无评分
摘要
Extending state-of-the-art object detectors from image to video is challenging. The accuracy of detection suffers from degenerated object appearances in videos, e.g., motion blur, video defocus, rare poses, etc. Existing work attempts to exploit temporal information on box level, but such methods are not trained end-to-end. We present flow-guided feature aggregation, an accurate and end-to-end learning framework for video object detection. It leverages temporal coherence on feature level instead. It improves the per-frame features by aggregation of nearby features along the motion paths, and thus improves the video recognition accuracy. Our method significantly improves upon strong singleframe baselines in ImageNet VID [33], especially for more challenging fast moving objects. Our framework is principled, and on par with the best engineered systems winning the ImageNet VID challenges 2016, without additional bells-and-whistles. The code would be released.
更多
查看译文
关键词
degenerated object appearances,video defocus,flow-guided feature aggregation,video object detection,feature level,per-frame features,video recognition accuracy,nearby features aggregation,ImageNet VID challenges 2016,ImageNet VID,motion paths,box level,temporal information,rare poses,motion blur,object detectors,end-to-end learning framework
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要