Self-Supervised Human Depth Estimation From Monocular Videos

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)(2020)

引用 34|浏览383
暂无评分
摘要
Previous methods on estimating detailed human depth often require supervised training with ‘ground truth’ depth data. This paper presents a self-supervised method that can be trained on YouTube videos without known depth, which makes training data collection simple and improves the generalization of the learned network. The self-supervised learning is achieved by minimizing a photo-consistency loss, which is evaluated between a video frame and its neighboring frames warped according to the estimated depth and the 3D non-rigid motion of the human body. To solve this non-rigid motion, we first estimate a rough SMPL model at each video frame and compute the non-rigid body motion accordingly, which enables self-supervised learning on estimating the shape details. Experiments demonstrate that our method enjoys better generalization, and performs much better on data in the wild.
更多
查看译文
关键词
video frame,neighboring frames,estimated depth,nonrigid motion,human body,nonrigid body motion,self-supervised learning,supervised human depth estimation,monocular videos,detailed human depth,supervised training,ground truth depth data,self-supervised method,YouTube videos,training data collection simple,learned network,photo-consistency loss
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要