3DTopia: Large Text-to-3D Generation Model with Hybrid Diffusion Priors
CoRR(2024)
Abstract
We present a two-stage text-to-3D generation system, namely 3DTopia, which
generates high-quality general 3D assets within 5 minutes using hybrid
diffusion priors. The first stage samples from a 3D diffusion prior directly
learned from 3D data. Specifically, it is powered by a text-conditioned
tri-plane latent diffusion model, which quickly generates coarse 3D samples for
fast prototyping. The second stage utilizes 2D diffusion priors to further
refine the texture of coarse 3D models from the first stage. The refinement
consists of both latent and pixel space optimization for high-quality texture
generation. To facilitate the training of the proposed system, we clean and
caption the largest open-source 3D dataset, Objaverse, by combining the power
of vision language models and large language models. Experiment results are
reported qualitatively and quantitatively to show the performance of the
proposed system. Our codes and models are available at
https://github.com/3DTopia/3DTopia
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined