Action recognition with multi-scale trajectory-pooled 3D convolutional descriptors
Multimedia Tools Appl.(2017)
摘要
Hand-crafted and learning-based features are two main types of video representations in the field of video understanding. How to integrate their merits to design good descriptors has been the research hotspot recently. Motivated by TDD (Wang et al. 2015 ), we combine trajectory pooling method and 3D ConvNets (Tran et al. 2015 ) and put forward a novel multi-scale trajectory-pooled 3D convolutional descriptor (MTC3D) for action recognition in this paper. Specifically, we calculate multi-scale dense trajectories from the input video and perform trajectory pooling on feature maps of 3D CNN. The proposed descriptor has two advantages: 3D CNN has the ability to extract high-level semantic information from videos and multi-scale trajectory pooling method utilizes the temporal information of videos subtly. The experiments on the datasets of HMDB51 and UCF101 demonstrate that the proposed descriptor achieves state-of-the-art results.
更多查看译文
关键词
Trajectory pooling, 3D ConvNets, Action recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络