Multimodal Emotion Recognition

Wen Wu, Xixin Wu, Qiujia Li, Guangzhi Sun,Florian Kreyssig

semanticscholar(2020)

引用 0|浏览0
暂无评分
摘要
One of the major tasks of intelligent human-machine interaction is to empower computers with the ability of “affective computing" [4] such that it can recognize a user’s emotional status and respond to the user in an affective way. This project develops a complete multimodal emotion recognition system that predicts the speaker’s emotion state based on speech, text, and video input. The system consists of two branches. A time synchronous branch where audio, word embeddings, and video embeddings are coupled at frame level. And a time asynchronous branch where sentence embeddings are combined with their context. These two branches are then fused to make prediction. The system generates state-of-the-art multimodal emotion classification accuracy on IEMOCAP database. In-depth investigation of properties of different modalities and their combination is provided. The emotion recognition problem is then re-examined. IEMOCAP database contains a large proportion of utterances that human annotators don’t completely agree on their emotion labels. These utterances are more common in reality but are usually ignored by traditional emotion classification problems. In that case, it is more reasonable to match the label distribution of the sentence rather than doing classification. “Soft" labels are then introduced, which improves label distribution matching by a significantly better KL divergence. Different ways of modelling the label distribution are discussed which includes the proposal of Dirichlet prior network.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要