Structure and Motion from Casual Videos.

European Conference on Computer Vision(2022)

引用 5|浏览56
暂无评分
摘要
Casual videos, such as those captured in daily life using a hand-held camera, pose problems for conventional structure-from-motion (SfM) techniques: the camera is often roughly stationary (not much parallax), and a large portion of the video may contain moving objects. Under such conditions, state-of-the-art SfM methods tend to produce erroneous results, often failing entirely. To address these issues, we propose CasualSAM, a method to estimate camera poses and dense depth maps from a monocular, casually-captured video. Like conventional SfM, our method performs a joint optimization over 3D structure and camera poses, but uses a pretrained depth prediction network to represent 3D structure rather than sparse keypoints. In contrast to previous approaches, our method does not assume motion is rigid or determined by semantic segmentation, instead optimizing for a per-pixel motion map based on reprojection error. Our method sets a new state-of-the-art for pose and depth estimation on the Sintel dataset, and produces high-quality results for the DAVIS dataset where most prior methods fail to produce usable camera poses.
更多
查看译文
关键词
Structure from motion,Depth estimation,Casual video
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要