Embedding Sequential Information Into Spatiotemporal Features For Action Recognition
2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)(2016)
摘要
In this paper, we introduce a novel framework for video-based action recognition, which incorporates the sequential information with the spatiotemporal features. Specifically, the spatiotemporal features are extracted from the sliced clips of videos, and then a recurrent neural network is applied to embed the sequential information into the final feature representation of the video. In contrast to most current deep learning methods for the video-based tasks, our framework incorporates both long-term dependencies and spatiotemporal information of the clips in the video. To extract the spatiotemporal features from the clips, both dense trajectories (DT) and a newly proposed 3D neural network, C3D, are applied in our experiments. Our proposed framework is evaluated on the benchmark datasets of UCF101 and HMDB51, and achieves comparable performance compared with the state-of-the-art results.
更多查看译文
关键词
video-based action recognition,spatiotemporal features,sequential information,feature extraction,recurrent neural network,feature representation,dense trajectories,3D neural network,C3D
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络