Glimpse and Zoom: Spatio-Temporal Focused Dynamic Network for Skeleton-based Action Recognition

Zhifu Zhao, Ziwei Chen,Jianan Li,Xiaotian Wang,Xuemei Xie, Lei Huang, Wanxin Zhang,Guangming Shi

IEEE Transactions on Circuits and Systems for Video Technology(2024)

引用 0|浏览3
暂无评分
摘要
GCN-based methods have achieved remarkable performance in skeleton-based action recognition. However, existing methods have not explicitly attempted to remove temporal and spatial redundancy that might introduce additional computational costs. Inspired by the fact that humans always tend to glimpse at overall motion and then zoom into the most important spatio-temporal regions, we propose a Spatio Temporal Focused Dynamic Network (STFD-Net) trained with reinforcement learning for skeleton-based action recognition. Specifically, we first propose a global extractor with Skeleton Pooling Module (SPM) to enable the network to focus on overall motion information with a refined skeleton structure. Then, a local extractor, containing pair-wise part partition, tubelet proposal network, and Partition-Grouped Module (PGM), is proposed to extract local motion details as a complement to the overall motion information. Finally, the dynamic classifier utilizes a recurrent neural network to dynamically terminate the process once the network is adequately confident. Extensive experiments have demonstrated that the proposed network achieves SOTA level performance with lower computational cost on the NTU 60 and NTU 120 dataset.
更多
查看译文
关键词
Action Recognition,Skeleton Data,Dynamic Network,Reinforcement Learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要