Text-centric Alignment for Multi-Modality Learning
CoRR(2024)
摘要
This research paper addresses the challenge of modality mismatch in
multimodal learning, where the modalities available during inference differ
from those available at training. We propose the Text-centric Alignment for
Multi-Modality Learning (TAMML) approach, an innovative method that utilizes
Large Language Models (LLMs) with in-context learning and foundation models to
enhance the generalizability of multimodal systems under these conditions. By
leveraging the unique properties of text as a unified semantic space, TAMML
demonstrates significant improvements in handling unseen, diverse, and
unpredictable modality combinations. TAMML not only adapts to varying
modalities but also maintains robust performance, showcasing the potential of
foundation models in overcoming the limitations of traditional fixed-modality
frameworks in embedding representations. This study contributes to the field by
offering a flexible, effective solution for real-world applications where
modality availability is dynamic and uncertain.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要