Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
arxiv(2023)
摘要
Monocular depth estimation is a fundamental computer vision task. Recovering
3D depth from a single image is geometrically ill-posed and requires scene
understanding, so it is not surprising that the rise of deep learning has led
to a breakthrough. The impressive progress of monocular depth estimators has
mirrored the growth in model capacity, from relatively modest CNNs to large
Transformer architectures. Still, monocular depth estimators tend to struggle
when presented with images with unfamiliar content and layout, since their
knowledge of the visual world is restricted by the data seen during training,
and challenged by zero-shot generalization to new domains. This motivates us to
explore whether the extensive priors captured in recent generative diffusion
models can enable better, more generalizable depth estimation. We introduce
Marigold, a method for affine-invariant monocular depth estimation that is
derived from Stable Diffusion and retains its rich prior knowledge. The
estimator can be fine-tuned in a couple of days on a single GPU using only
synthetic training data. It delivers state-of-the-art performance across a wide
range of datasets, including over 20
Project page: https://marigoldmonodepth.github.io.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要