MInD: Improving Multimodal Sentiment Analysis via Multimodal Information Disentanglement
CoRR(2024)
摘要
Learning effective joint representations has been a central task in
multimodal sentiment analysis. Previous methods focus on leveraging the
correlations between different modalities and enhancing performance through
sophisticated fusion techniques. However, challenges still exist due to the
inherent heterogeneity of distinct modalities, which may lead to distributional
gap, impeding the full exploitation of inter-modal information and resulting in
redundancy and impurity in the information extracted from features. To address
this problem, we introduce the Multimodal Information Disentanglement (MInD)
approach. MInD decomposes the multimodal inputs into a modality-invariant
component, a modality-specific component, and a remnant noise component for
each modality through a shared encoder and multiple private encoders. The
shared encoder aims to explore the shared information and commonality across
modalities, while the private encoders are deployed to capture the distinctive
information and characteristic features. These representations thus furnish a
comprehensive perspective of the multimodal data, facilitating the fusion
process instrumental for subsequent prediction tasks. Furthermore, MInD
improves the learned representations by explicitly modeling the task-irrelevant
noise in an adversarial manner. Experimental evaluations conducted on benchmark
datasets, including CMU-MOSI, CMU-MOSEI, and UR-Funny, demonstrate MInD's
superior performance over existing state-of-the-art methods in both multimodal
emotion recognition and multimodal humor detection tasks.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要