Multimodal Representation Learning For Real-World Applications

Multimodal Interfaces and Machine Learning for Multimodal Interaction（2022）

引用 0|浏览4

暂无评分

摘要

ABSTRACTMultimodal representation learning has shown tremendous improvements in recent years. An extensive set of works for fusing multiple modalities have shown promising results on the public benchmarks. However, most famous works target unrealistic settings or toy datasets, and a considerable gap exists between the real-world implications of the existing methods. In this work, we aim to bridge the gap between the well-defined benchmark settings and the real-world use cases. We aim to explore architectures inspired by existing promising approaches that have the potential to be implemented in real-world instances. Moreover, we also try to move the research forward by addressing questions that can be solved using multimodal approaches and have a considerable impact on the community. With this work, we attempt to leverage the multimodal representation learning methods, which directly apply to real-world settings.

查看译文

关键词

Multimodal Representations, Multimodal Fusion, Cross-modal Processing, Deep Learning Architectures, Machine Learning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要