Action Recognition in Videos with Spatio-Temporal Fusion 3D Convolutional Neural Networks

PATTERN RECOGNITION AND IMAGE ANALYSIS(2021)

引用 1|浏览0
暂无评分
摘要
Feature extraction based traditional human action recognition algorithms are complicated, leading to low recognition accuracy. We present an algorithm for the recognition of human actions in videos based on spatio-temporal fusion using 3D convolutional neural networks (3D CNNs). The algorithm contains two subnetworks, which extract deep spatial information and temporal information, respectively, and bilinear fusion policy is applied to obtain the final fused spatio-temporal information. Spatial information is represented by a gradient feature, and the temporal information is represented by optical flow. The fused spatio-temporal information can retrieve deep features from multiple angles by constructing a new 3D CNNs. The proposed algorithm is compared with the current mainstream algorithms in the KTH and UCF101 datasets, showing effectiveness and high recognition accuracy.
更多
查看译文
关键词
video action recognition, 3D Convolutional Neural Network, spatiotemporal information, bilinear fusion
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要