GCM: Efficient video recognition with glance and combine module

Pattern Recognition(2023)

引用 3|浏览49
暂无评分
摘要
•Glance and combine module (GCM), a highly efficient 3D spatio-temporal convolutional block is proposed for video action recognition.•GCM performs an extra glancing at a higher scale to get a broader perspective of spatio-temporal features, then combines them at different scales.•Ablation studies shows the proposed GCM is much more efficient than other forms of 3Dspatio-temporal convolutional blocks.•On action recognition datasets, GCM achieves SOTA performance with less than two thirds the computational complexity of other models.•On fine-grained action recognition dataset, GCM beats previous SOTA accuracy achieved with 2-stream methods by more than 6% using only RGB input.
更多
查看译文
关键词
Glance and combine module,Video action recognition,Spatio-temporal convolution,Action recognition datasets
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要