SM4Depth: Seamless Monocular Metric Depth Estimation across Multiple Cameras and Scenes by One Model
CoRR(2024)
摘要
The generalization of monocular metric depth estimation (MMDE) has been a
longstanding challenge. Recent methods made progress by combining relative and
metric depth or aligning input image focal length. However, they are still
beset by challenges in camera, scene, and data levels: (1) Sensitivity to
different cameras; (2) Inconsistent accuracy across scenes; (3) Reliance on
massive training data. This paper proposes SM4Depth, a seamless MMDE method, to
address all the issues above within a single network. First, we reveal that a
consistent field of view (FOV) is the key to resolve “metric ambiguity”
across cameras, which guides us to propose a more straightforward preprocessing
unit. Second, to achieve consistently high accuracy across scenes, we
explicitly model the metric scale determination as discretizing the depth
interval into bins and propose variation-based unnormalized depth bins. This
method bridges the depth gap of diverse scenes by reducing the ambiguity of the
conventional metric bin. Third, to reduce the reliance on massive training
data, we propose a “divide and conquer" solution. Instead of estimating
directly from the vast solution space, the correct metric bins are estimated
from multiple solution sub-spaces for complexity reduction. Finally, with just
150K RGB-D pairs and a consumer-grade GPU for training, SM4Depth achieves
state-of-the-art performance on most previously unseen datasets, especially
surpassing ZoeDepth and Metric3D on mRI_θ. The code can be found at
https://github.com/1hao-Liu/SM4Depth.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要