Revealing Multimodal Contrastive Representation Learning through Latent Partial Causal Models
CoRR(2024)
摘要
Multimodal contrastive representation learning methods have proven successful
across a range of domains, partly due to their ability to generate meaningful
shared representations of complex phenomena. To enhance the depth of analysis
and understanding of these acquired representations, we introduce a unified
causal model specifically designed for multimodal data. By examining this
model, we show that multimodal contrastive representation learning excels at
identifying latent coupled variables within the proposed unified model, up to
linear or permutation transformations resulting from different assumptions. Our
findings illuminate the potential of pre-trained multimodal models, eg, CLIP,
in learning disentangled representations through a surprisingly simple yet
highly effective tool: linear independent component analysis. Experiments
demonstrate the robustness of our findings, even when the assumptions are
violated, and validate the effectiveness of the proposed method in learning
disentangled representations.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要