MotionGPT: Human Motion Synthesis with Improved Diversity and Realism via GPT-3 Prompting.

José Ribeiro-Gomes, Tianhui Cai, Zoltán Ádám Milacski, Chen Wu, Aayush Prakash, Shingo Jason Takagi, Amaury Aubel, Daeil Kim, Alexandre Bernardino, Fernando De la Torre

IEEE/CVF Winter Conference on Applications of Computer Vision(2024)

引用 0|浏览0
暂无评分
摘要
There are numerous applications for human motion synthesis, including animation, gaming, robotics, or sports science. In recent years, human motion generation from natural language has emerged as a promising alternative to costly and labor-intensive data collection methods relying on motion capture or wearable sensors (e.g., suits). Despite this, generating human motion from textual descriptions remains a challenging and intricate task, primarily due to the scarcity of large-scale supervised datasets capable of capturing the full diversity of human activity.This study proposes a new approach, called MotionGPT, to address the limitations of previous text-based human motion generation methods by utilizing the extensive semantic information available in large language models (LLMs). We first pretrain a doubly text-conditional motion diffusion model on both coarse ("high-level") and detailed ("low-level") ground truth text data. Then during inference, we improve motion diversity and alignment with the training set, by zero-shot prompting GPT-3 for additional "low-level" details. Our method achieves new state-of-the-art quantitative results in terms of Fréchet Inception Distance (FID) and motion diversity metrics, and improves all considered metrics. Furthermore, it has strong qualitative performance, producing natural results. Code is available at https://github.com/humansensinglab/MotionGPT
更多
查看译文
关键词
Algorithms,Generative models for image,video,3D,etc.,Algorithms,Biometrics,face,gesture,body pose,Applications,Arts / games / social media
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要