Self-Attentive Feature-Level Fusion for Multimodal Emotion Detection

2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)(2018)

引用 36|浏览12
暂无评分
摘要
Multimodal emotion recognition is the task of detecting emotions present in user-generated multimedia content. Such resources contain complementary information in multiple modalities. A stiff challenge often faced is the complexity associated with feature-level fusion of these heterogeneous modes. In this paper, we propose a new feature-level fusion method based on self-attention mechanism. We also compare it with traditional fusion methods such as concatenation, outer-product, etc. Analyzed using textual and speech (audio) modalities, our results suggest that the proposed fusion method outperforms others in the context of utterance-level emotion recognition in videos.
更多
查看译文
关键词
Multimodal emotion recognition,Feature level Fusion,Self Attention
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要