Contact-aware Human Motion Generation from Textual Descriptions
arxiv(2024)
摘要
This paper addresses the problem of generating 3D interactive human motion
from text. Given a textual description depicting the actions of different body
parts in contact with objects, we synthesize sequences of 3D body poses that
are visually natural and physically plausible. Yet, this task poses a
significant challenge due to the inadequate consideration of interactions by
physical contacts in both motion and textual descriptions, leading to unnatural
and implausible sequences. To tackle this challenge, we create a novel dataset
named RICH-CAT, representing “Contact-Aware Texts” constructed from the RICH
dataset. RICH-CAT comprises high-quality motion, accurate human-object contact
labels, and detailed textual descriptions, encompassing over 8,500 motion-text
pairs across 26 indoor/outdoor actions. Leveraging RICH-CAT, we propose a novel
approach named CATMO for text-driven interactive human motion synthesis that
explicitly integrates human body contacts as evidence. We employ two VQ-VAE
models to encode motion and body contact sequences into distinct yet
complementary latent spaces and an intertwined GPT for generating human motions
and contacts in a mutually conditioned manner. Additionally, we introduce a
pre-trained text encoder to learn textual embeddings that better discriminate
among various contact types, allowing for more precise control over synthesized
motions and contacts. Our experiments demonstrate the superior performance of
our approach compared to existing text-to-motion methods, producing stable,
contact-aware motion sequences. Code and data will be available for research
purposes.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要