Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction
CoRR(2024)
Abstract
We tackle the challenge of efficiently reconstructing a 3D asset from a
single image at millisecond speed. Existing methods for single-image 3D
reconstruction are primarily based on Score Distillation Sampling (SDS) with
Neural 3D representations. Despite promising results, these approaches
encounter practical limitations due to lengthy optimizations and significant
memory consumption. In this work, we introduce Gamba, an end-to-end 3D
reconstruction model from a single-view image, emphasizing two main insights:
(1) Efficient Backbone Design: introducing a Mamba-based GambaFormer network to
model 3D Gaussian Splatting (3DGS) reconstruction as sequential prediction with
linear scalability of token length, thereby accommodating a substantial number
of Gaussians; (2) Robust Gaussian Constraints: deriving radial mask constraints
from multi-view masks to eliminate the need for warmup supervision of 3D point
clouds in training. We trained Gamba on Objaverse and assessed it against
existing optimization-based and feed-forward 3D reconstruction approaches on
the GSO Dataset, among which Gamba is the only end-to-end trained single-view
reconstruction model with 3DGS. Experimental results demonstrate its
competitive generation capabilities both qualitatively and quantitatively and
highlight its remarkable speed: Gamba completes reconstruction within 0.05
seconds on a single NVIDIA A100 GPU, which is about 1,000× faster than
optimization-based methods. Please see our project page at
https://florinshen.github.io/gamba-project.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined