3D generation on ImageNet

ICLR 2023(2023)

引用 32|浏览137
暂无评分
摘要
All existing 3D-from-2D generators are designed for well-curated and alignable datasets: objects can be placed in the same position, similarly scaled and oriented, such that the camera always points to the center of the scene. This alignment procedure is infeasible for diverse, in-the-wild datasets: 1) it requires expensive annotation for each object category; and 2) most images are inherently "non-alignable" (e.g., it is impossible to align a "cat face" with a "kitchen"). As a result, existing 3D generators are not scalable to large in-the-wild datasets. In this work, for the first time, we propose a 3D generator which works on non-aligned datasets. First, we develop a technique to use an off-the-shelf, imprecise depth estimator to incorporate the 3D inductive bias into a GAN-based generator. Then, we create a novel learnable camera parametrization which does not use any alignment assumptions and construct a camera gradient penalty regularization. Finally, we propose a simple distillation-based technique to transfer the knowledge from an off-the-shelf feature embedder, like ResNet50, into our discriminator. Our work is the first one which develops a 3D generator for non-aligned data. We conduct experiments on SDIP Dogs, SDIP Elephants, LSUN Horse, and ImageNet on the 256x256 resolution to demonstrate the effectiveness of our ideas. Visualizations: https://u2wjb9xxz9q.github.io.
更多
查看译文
关键词
3d-generation,gans,generative adversarial networks,knowledge distillation,nerf,stylegan,radiance fields,volume rendering
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要