Privileged Knowledge Distillation for Dimensional Emotion Recognition in the Wild.

CVPR Workshops(2023)

引用 5|浏览8
暂无评分
摘要
Automated emotion recognition (AER) has a growing number of applications, ranging from behavior analysis in assistive robotics and e-learning to depression and pain estimation healthcare. Systems for multimodal AER typically outperform unimodal approaches due to the complementary and redundant semantic information across modalities like visual, audio, language, physiological, etc. However, in practice, only a subset of these modalities is available at inference time, and using multiple modalities increases systems complexity. This paper focuses on video-based AER and aims to enhance the accuracy of unimodal systems by leveraging the Learning Under Privileged Information (LUPI) paradigm with information from multiple modalities. Without loss of generality, this study considers the audio modality as privileged information (only available during training), and introduces a new multimodal to unimodal privileged knowledge distillation (PKD). The teacher network is comprised of a multimodal AER architecture that can process audio-visual information and distills the learned knowledge to a unimodal visual student network. We validate our proposed multimodal PKD method on the challenging RECOLA and Affwild2 datasets for video-based AER, using weak and strong baseline AER architectures, as well as joint cross-attention fusion methods. The proposed method increases the absolute average concordance correlation coefficient accuracy by 8% on the RECOLA dataset, and by 2% on the arousal dimension of the Affwild2 dataset. The code available at multimodal-pkd.
更多
查看译文
关键词
assistive robotics,audio language physiological,audio modality,audio-visual information,automated emotion recognition,behavior analysis,complementary information,cross-attention fusion methods,depression,dimensional emotion recognition,e-learning,learned knowledge,learning under privileged information paradigm,LUPI,multimodal AER architecture,multimodal PKD method,pain estimation healthcare,RECOLA dataset,redundant semantic information,teacher network,unimodal privileged knowledge distillation,unimodal visual student network,video-based AER,visual language physiological,wild
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要