Rethinking the Faster R-CNN Architecture for Temporal Action Localization

Yu-Wei Chao,Sudheendra Vijayanarasimhan,Bryan Seybold,David A. Ross,Jia Deng,Rahul Sukthankar

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition（2018）

引用 1|浏览413

暂无评分

摘要

We propose TAL-Net, an improved approach to temporal action localization in video that is inspired by the Faster R-CNN object detection framework. TAL-Net addresses three key shortcomings of existing approaches: (1) we improve receptive field alignment using a multi-scale architecture that can accommodate extreme variation in action durations; (2) we better exploit the temporal context of actions for both proposal generation and action classification by appropriately extending receptive fields; and (3) we explicitly consider multi-stream feature fusion and demonstrate that fusing motion late is important. We achieve state-of-the-art performance for both action proposal and localization on THUMOS'14 detection benchmark and competitive performance on ActivityNet challenge.

查看译文

关键词

action proposal,temporal action localization,receptive field alignment,multiscale architecture,action durations,proposal generation,action classification,multistream feature fusion,R-CNN architecture,RCNN object detection framework,TAL-Net,ActivityNet challenge

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要