Modality-invariant and Specific Prompting for Multimodal Human Perception Understanding.
CoRR(2023)
摘要
Understanding human perceptions presents a formidable multimodal challenge
for computers, encompassing aspects such as sentiment tendencies and sense of
humor. While various methods have recently been introduced to extract
modality-invariant and specific information from diverse modalities, with the
goal of enhancing the efficacy of multimodal learning, few works emphasize this
aspect in large language models. In this paper, we introduce a novel multimodal
prompt strategy tailored for tuning large language models. Our method assesses
the correlation among different modalities and isolates the modality-invariant
and specific components, which are then utilized for prompt tuning. This
approach enables large language models to efficiently and effectively
assimilate information from various modalities. Furthermore, our strategy is
designed with scalability in mind, allowing the integration of features from
any modality into pretrained large language models. Experimental results on
public datasets demonstrate that our proposed method significantly improves
performance compared to previous methods.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要