Improving self-supervised action recognition from extremely augmented skeleton sequences

Pattern Recognition(2024)

引用 0|浏览0
暂无评分
摘要
Self-supervised contrastive learning has been widely applied to skeleton-based action recognition due to its ability to learn discriminative features. However, directly applying the existing contrastive learning framework for 3D skeleton learning is limited by the well-designed augmentations and the simple multi-stream decision-level fusion. To deal with these drawbacks, we propose a three-stream contrastive learning framework utilizing abundant information mining for self-supervised action representation (3s-AimCLR++). For single-stream contrastive learning, extreme augmentation is first proposed to generate more movement patterns, which can introduce more movement patterns to improve the universality of the learned representations. Since directly using extreme augmentation can barely boost the performance due to the drastic changes in original identity, the Distributional Divergence Minimization (DDM) loss is proposed to utilize the extreme augmentation more gently. Moreover, the Single-Stream Nearest Neighbors Mining (SNNM) is proposed to expand positive samples to make the learning process more reasonable. For multi-stream, existing methods simply ensemble the results. Yet, considering the complementarity of information between different streams, we propose Multi-Stream Aggregation and Interaction (MSAI) strategy to better fuse multi-stream information. Extensive experiments on NTU-60, NTU-120, and PKU-MMD datasets have verified that our 3s-AimCLR++ can significantly perform favorably against state-of-the-art methods under a variety of evaluation protocols. The code and models are available at https://github.com/Levigty/AimCLR-v2.
更多
查看译文
关键词
Self-supervised skeleton-based action recognition,Contrastive learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要