NHFNET: A Non-Homogeneous Fusion Network for Multimodal Sentiment Analysis

Ziwang Fu,Feng Liu,Qing Xu,Jiayin Qi,Xiangling Fu,Aimin Zhou,Zhibin Li

2022 IEEE International Conference on Multimedia and Expo (ICME)（2022）

引用 5|浏览89

暂无评分

摘要

Fusion technology is crucial for multimodal sentiment analysis. Recent attention-based fusion methods demonstrate high performance and strong robustness. However, these approaches ignore the difference in information density among the three modalities, i.e., visual and audio have low-level signal features and conversely text has high-level semantic features. To this end, we propose a non-homogeneous fusion network (NHFNet) to achieve multimodal information interaction. Specifically, a fusion module with attention aggregation is designed to handle the fusion of visual and audio modalities to enhance them to high-level semantic features. Then, cross-modal attention is used to achieve information reinforcement of text modality and audio-visual fusion. NHFNet compensates for the differences in information density of different modalities enabling their fair interaction. To verify the effectiveness of the proposed method, we set up the aligned and unaligned experiments on the CMU-MOSEI dataset, respectively. The experimental results show that the proposed method outperforms the state-of-the-art. Codes are available at https://github.com/skeletonNN/NHFNet.

查看译文

关键词

Multimodal sentiment analysis,fusion,attention aggregation,cross-modal attention

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要