Vlad(3): Encoding Dynamics Of Deep Features For Action Recognition

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)(2016)

引用 65|浏览52
暂无评分
摘要
Previous approaches to action recognition with deep features tend to process video frames only within a small temporal region, and do not model long-range dynamic information explicitly. However, such information is important for the accurate recognition of actions, especially for the discrimination of complex activities that share sub-actions, and when dealing with untrimmed videos. Here, we propose a representation, VLAD for Deep Dynamics (VLAD(3)), that accounts for different levels of video dynamics. It captures short-term dynamics with deep convolutional neural network features, relying on linear dynamic systems ( LDS) to model medium-range dynamics. To account for long-range inhomogeneous dynamics, a VLAD descriptor is derived for the LDS and pooled over the whole video, to arrive at the final VLAD3 representation. An extensive evaluation was performed on Olympic Sports, UCF101 and THUMOS15, where the use of the VLAD3 representation leads to state-of-the-art results.
更多
查看译文
关键词
action recognition,VLAD3 approach,video dynamics,deep features,linear dynamic systems,LDS,Olympic Sports dataset,UCF101 dataset,THUMOS15 dataset,VLAD for deep dynamics approach
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要