A Unified Approach for Text- and Image-guided 4D Scene Generation
CVPR 2024(2023)
摘要
Large-scale diffusion generative models are greatly simplifying image, video
and 3D asset creation from user-provided text prompts and images. However, the
challenging problem of text-to-4D dynamic 3D scene generation with diffusion
guidance remains largely unexplored. We propose Dream-in-4D, which features a
novel two-stage approach for text-to-4D synthesis, leveraging (1) 3D and 2D
diffusion guidance to effectively learn a high-quality static 3D asset in the
first stage; (2) a deformable neural radiance field that explicitly
disentangles the learned static asset from its deformation, preserving quality
during motion learning; and (3) a multi-resolution feature grid for the
deformation field with a displacement total variation loss to effectively learn
motion with video diffusion guidance in the second stage. Through a user
preference study, we demonstrate that our approach significantly advances image
and motion quality, 3D consistency and text fidelity for text-to-4D generation
compared to baseline approaches. Thanks to its motion-disentangled
representation, Dream-in-4D can also be easily adapted for controllable
generation where appearance is defined by one or multiple images, without the
need to modify the motion learning stage. Thus, our method offers, for the
first time, a unified approach for text-to-4D, image-to-4D and personalized 4D
generation tasks.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要