A survey of transformer-based multimodal pre-trained modals

Neurocomputing(2023)

引用 8|浏览53
暂无评分
摘要
•Multimodal Pre-trained models with document layout, vision-text and audio-text domains as input.•Collection of common multimodal downstream applications with related datasets.•Modality feature embedding strategies.•Cross-modality alignment pre-training tasks for different multimodal domains.•Variations of the audio-text cross-modal learning architecture.
更多
查看译文
关键词
Transformer,Pre-trained model,Multimodal,Documet Layout
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要