Global And Local Discriminative Patches Exploiting For Action Recognition

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING(2020)

引用 3|浏览43
暂无评分
摘要
Recent human action recognition models mainly focus on exploiting human features, such as pose or skeleton features. However, most of these methods do not pay enough attention to action-related backgrounds. In this work we propose a novel multi-stream features fusion framework based on discriminative patch exploiting. Unlike existing part-based or attention-based multi-stream methods, our work improves the recognition accuracy by 1) Paying more attention to exploiting of global and local discriminative patches, which include not only the acting human but also the interactive scenes. 2) Proposing an effective multi-stream feature pooling and fusion mechanism: 2D and 3D features from RGB frames and discriminative patches are combined to enhance spatial-temporal feature representation ability. Our framework is evaluated on two widely used video action benchmarks, where it outperforms other state-of-the-art methods: the accuracy up to 87.8% at HMDB51, and 98.8% at UCF101.
更多
查看译文
关键词
Discriminative Patches, Class Activation Map, 2D and 3D ConvNets, Feature Fusion
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要