Reality's Canvas, Language's Brush: Crafting 3D Avatars from Monocular Video
CoRR(2023)
摘要
Recent advancements in 3D avatar generation excel with multi-view supervision
for photorealistic models. However, monocular counterparts lag in quality
despite broader applicability. We propose ReCaLab to close this gap. ReCaLab is
a fully-differentiable pipeline that learns high-fidelity 3D human avatars from
just a single RGB video. A pose-conditioned deformable NeRF is optimized to
volumetrically represent a human subject in canonical T-pose. The canonical
representation is then leveraged to efficiently associate viewpoint-agnostic
textures using 2D-3D correspondences. This enables to separately generate
albedo and shading which jointly compose an RGB prediction. The design allows
to control intermediate results for human pose, body shape, texture, and
lighting with text prompts. An image-conditioned diffusion model thereby helps
to animate appearance and pose of the 3D avatar to create video sequences with
previously unseen human motion. Extensive experiments show that ReCaLab
outperforms previous monocular approaches in terms of image quality for image
synthesis tasks. ReCaLab even outperforms multi-view methods that leverage up
to 19x more synchronized videos for the task of novel pose rendering. Moreover,
natural language offers an intuitive user interface for creative manipulation
of 3D human avatars.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要