Emotion Recognition Based on Wasserstein Distance Fusion of Audiovisual Features

Nianxin Ai,Shuiping Zhang,Niannian Yi, Zhiwei Ma

2023 6th International Conference on Robotics, Control and Automation Engineering (RCAE)(2023)

引用 0|浏览0
暂无评分
摘要
The task of multimodal emotion recognition aims to discern human emotions by utilizing both speech and facial expressions. Given the significance of complementarity between different modal sources, fusion has emerged as a prominent area of research in multimodal emotion recognition. However, when merging multiple modalities, two distinct challenges arise: the high dimensionality of feature vectors and redundancy of information. Furthermore, this complexity introduces intricacies into the data. To address these challenges, we introduce a model based on the Wasserstein distance. This model extracts meaningful features from each modality by minimizing the Wasserstein distance, reducing information redundancy through the selection of these meaningful features. Subsequently, the effective features from both modalities are integrated, partially mitigating the high dimensionality of the feature data and simplifying it. Finally, a classification task is applied to the integrated effective features to determine the ultimate emotional category. Experimental results conducted on the RA VDESS dataset demonstrate that our model outperforms post-fusion models by achieving an accuracy 3.7 % higher, reaching an impressive accuracy rate of 85.2%.
更多
查看译文
关键词
multimodal,emotion recognition,Wasserstein distance,fusion
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要