Motion Generation from Fine-grained Textual Descriptions
arxiv(2024)
摘要
The task of text2motion is to generate motion sequences from given textual
descriptions, where a model should explore the interactions between natural
language instructions and human body movements. While most existing works are
confined to coarse-grained motion descriptions (e.g., "A man squats."),
fine-grained ones specifying movements of relevant body parts are barely
explored. Models trained with coarse texts may not be able to learn mappings
from fine-grained motion-related words to motion primitives, resulting in the
failure in generating motions from unseen descriptions. In this paper, we build
a large-scale language-motion dataset with fine-grained textual descriptions,
FineHumanML3D, by feeding GPT-3.5-turbo with delicate prompts. Accordingly, we
design a new text2motion model, FineMotionDiffuse, which makes full use of
fine-grained textual information. Our experiments show that FineMotionDiffuse
trained on FineHumanML3D acquires good results in quantitative evaluation. We
also find this model can better generate spatially/chronologically composite
motions by learning the implicit mappings from simple descriptions to the
corresponding basic motions.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要