NHFNET: A Non-Homogeneous Fusion Network for Multimodal Sentiment Analysis

2022 IEEE International Conference on Multimedia and Expo (ICME)(2022)

引用 5|浏览89
暂无评分
摘要
Fusion technology is crucial for multimodal sentiment analysis. Recent attention-based fusion methods demonstrate high performance and strong robustness. However, these approaches ignore the difference in information density among the three modalities, i.e., visual and audio have low-level signal features and conversely text has high-level semantic features. To this end, we propose a non-homogeneous fusion network (NHFNet) to achieve multimodal information interaction. Specifically, a fusion module with attention aggregation is designed to handle the fusion of visual and audio modalities to enhance them to high-level semantic features. Then, cross-modal attention is used to achieve information reinforcement of text modality and audio-visual fusion. NHFNet compensates for the differences in information density of different modalities enabling their fair interaction. To verify the effectiveness of the proposed method, we set up the aligned and unaligned experiments on the CMU-MOSEI dataset, respectively. The experimental results show that the proposed method outperforms the state-of-the-art. Codes are available at https://github.com/skeletonNN/NHFNet.
更多
查看译文
关键词
Multimodal sentiment analysis,fusion,attention aggregation,cross-modal attention
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要