SPAD : Spatially Aware Multiview Diffusers
CoRR(2024)
摘要
We present SPAD, a novel approach for creating consistent multi-view images
from text prompts or single images. To enable multi-view generation, we
repurpose a pretrained 2D diffusion model by extending its self-attention
layers with cross-view interactions, and fine-tune it on a high quality subset
of Objaverse. We find that a naive extension of the self-attention proposed in
prior work (e.g. MVDream) leads to content copying between views. Therefore, we
explicitly constrain the cross-view attention based on epipolar geometry. To
further enhance 3D consistency, we utilize Plucker coordinates derived from
camera rays and inject them as positional encoding. This enables SPAD to reason
over spatial proximity in 3D well. In contrast to recent works that can only
generate views at fixed azimuth and elevation, SPAD offers full camera control
and achieves state-of-the-art results in novel view synthesis on unseen objects
from the Objaverse and Google Scanned Objects datasets. Finally, we demonstrate
that text-to-3D generation using SPAD prevents the multi-face Janus issue. See
more details at our webpage: https://yashkant.github.io/spad
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要