Revisiting Multi-modal Emotion Learning with Broad State Space Models and Probability-guidance Fusion
CoRR(2024)
摘要
Multi-modal Emotion Recognition in Conversation (MERC) has received
considerable attention in various fields, e.g., human-computer interaction and
recommendation systems. Most existing works perform feature disentanglement and
fusion to extract emotional contextual information from multi-modal features
and emotion classification. After revisiting the characteristic of MERC, we
argue that long-range contextual semantic information should be extracted in
the feature disentanglement stage and the inter-modal semantic information
consistency should be maximized in the feature fusion stage. Inspired by recent
State Space Models (SSMs), Mamba can efficiently model long-distance
dependencies. Therefore, in this work, we fully consider the above insights to
further improve the performance of MERC. Specifically, on the one hand, in the
feature disentanglement stage, we propose a Broad Mamba, which does not rely on
a self-attention mechanism for sequence modeling, but uses state space models
to compress emotional representation, and utilizes broad learning systems to
explore the potential data distribution in broad space. Different from previous
SSMs, we design a bidirectional SSM convolution to extract global context
information. On the other hand, we design a multi-modal fusion strategy based
on probability guidance to maximize the consistency of information between
modalities. Experimental results show that the proposed method can overcome the
computational and memory limitations of Transformer when modeling long-distance
contexts, and has great potential to become a next-generation general
architecture in MERC.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要