Collaborative and Multilevel Feature Selection Network for Action Recognition

IEEE Transactions on Neural Networks and Learning Systems(2023)

引用 7|浏览78
暂无评分
摘要
The feature pyramid has been widely used in many visual tasks, such as fine-grained image classification, instance segmentation, and object detection, and had been achieving promising performance. Although many algorithms exploit different-level features to construct the feature pyramid, they usually treat them equally and do not make an in-depth investigation on the inherent complementary advantages of different-level features. In this article, to learn a pyramid feature with the robust representational ability for action recognition, we propose a novel collaborative and multilevel feature selection network (FSNet) that applies feature selection and aggregation on multilevel features according to action context. Unlike previous works that learn the pattern of frame appearance by enhancing spatial encoding, the proposed network consists of the position selection module and channel selection module that can adaptively aggregate multilevel features into a new informative feature from both position and channel dimensions. The position selection module integrates the vectors at the same spatial location across multilevel features with positionwise attention. Similarly, the channel selection module selectively aggregates the channel maps at the same channel location across multilevel features with channelwise attention. Positionwise features with different receptive fields and channelwise features with different pattern-specific responses are emphasized respectively depending on their correlations to actions, which are fused as a new informative feature for action recognition. The proposed FSNet can be inserted into different backbone networks flexibly, and extensive experiments are conducted on three benchmark action datasets, Kinetics, UCF101, and HMDB51. Experimental results show that FSNet is practical and can be collaboratively trained to boost the representational ability of existing networks. FSNet achieves superior performance against most top-tier models on Kinetics and all models on UCF101 and HMDB51.
更多
查看译文
关键词
Action recognition,feature selection,multilevel feature,spatiotemporal feature
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要