Implicit attention-based cross-modal collaborative learning for action recognition

2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP(2023)

引用 0|浏览0
暂无评分
摘要
Human action recognition is an active research topic in recent years. Multiple modalities often convey heterogeneous but potentially complementary action information that single modality does not hold. Some efforts have been resoted to explore cross-modal representation to promote the modeling capability, but with limited improvement due to the simple fusion of different modalities. To this end, we propose an impliCit attention-based Cross-modal Collaborative Learning (C3L) for action recognition. Specifically, we apply a Modality Generalization network with Grayscale enhancement (MGG) to learn specific modality representation and interaction (infrared and RGB). Then, we construct a unified representation space through the Uniform Modality Representation module (UMR), which preserves the modality information while enhancing the overall representation ability. Finally, feature extractors adaptively leverage modality-specific knowledge to realize cross-modal collaborative learning. Extensive experiments conducted on three widely-used public benchmarks InfAR, HMDB51, and UCF101, demonstrate the effectiveness and strength of our proposed method.
更多
查看译文
关键词
Action recognition,Modality generalization,Collaborative learning,Uniform modality representation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要