An Interactive Agent Foundation Model
CoRR(2024)
摘要
The development of artificial intelligence systems is transitioning from
creating static, task-specific models to dynamic, agent-based systems capable
of performing well in a wide range of applications. We propose an Interactive
Agent Foundation Model that uses a novel multi-task agent training paradigm for
training AI agents across a wide range of domains, datasets, and tasks. Our
training paradigm unifies diverse pre-training strategies, including visual
masked auto-encoders, language modeling, and next-action prediction, enabling a
versatile and adaptable AI framework. We demonstrate the performance of our
framework across three separate domains – Robotics, Gaming AI, and Healthcare.
Our model demonstrates its ability to generate meaningful and contextually
relevant outputs in each area. The strength of our approach lies in its
generality, leveraging a variety of data sources such as robotics sequences,
gameplay data, large-scale video datasets, and textual information for
effective multimodal and multi-task learning. Our approach provides a promising
avenue for developing generalist, action-taking, multimodal systems.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要