KeyTr: Keypoint Transporter for 3D Reconstruction of Deformable Objects in Videos

IEEE Conference on Computer Vision and Pattern Recognition(2022)

引用 6|浏览276
暂无评分
摘要
We consider the problem of reconstructing the depth of dynamic objects from videos. Recent progress in dynamic video depth prediction has focused on improving the output of monocular depth estimators by means of multi-view constraints while imposing little to no restrictions on the deformation of the dynamic parts of the scene. However, the theory of Non-Rigid Structure from Motion prescribes to constrain the deformations for 3D reconstruction. We thus propose a new model that departs significantly from this prior work. The idea is to fit a dynamic point cloud to the video data using Sinkhorn's algorithm to associate the 3D points to 2D pixels and use a differentiable point renderer to ensure the compatibility of the 3D deformations with the measured optical flow. In this manner, our algorithm, called Keypoint Transporter, models the overall deformation of the object within the entire video, so it can constrain the reconstruction correspondingly. Compared to weaker deformation models, this significantly reduces the reconstruction ambiguity and, for dynamic objects, allows Keypoint Transporter to obtain reconstructions of the quality superior or at least comparable to prior approaches while being much faster and reliant on a pre-trained monocular depth estimator network. To assess the method, we evaluate on new datasets of synthetic videos depicting dynamic humans and animals with ground-truth depth. We also show qualitative results on crowd-sourced real-world videos of pets.
更多
查看译文
关键词
3D from multi-view and sensors
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要