MHCanonNet: Multi-Hypothesis Canonical lifting Network for self-supervised 3D human pose estimation in the wild video

Pattern Recognition(2024)

引用 0|浏览37
暂无评分
摘要
Recent advancements in 3D Human Pose Estimation using fully-supervised learning approach have shown impressive results; however, these methods heavily rely on large amounts of annotated 3D data, which are challenging to obtain outside controlled laboratory environments. Therefore, in this study, we propose a new self-supervised training method designed to train a 3D human pose estimation network using unlabeled multi-view images. The model trains relative depths between joints without any 3D annotation by satisfying multi-view consistency constraints from unlabeled multi-view videos without camera calibration, while simultaneously learning representations of multiple plausible pose hypotheses. For this reason, we call our proposed network a Multi-Hypothesis Canonical Lifting Network (MHCanonNet). By enriching the diversity of extracted features and keeping various possibilities open, our network accurately estimates the final 3D pose. The key idea lies in the design of a novel and unbiased reconstruction objective function that combines multiple hypotheses from different viewpoints. The proposed approach demonstrates state-of-the-art results not only on two popular benchmark datasets, Human3.6M and MPI-INF-3DHP but also on an in-the-wild dataset, Ski-Pose, surpassing existing self-supervised training methods.
更多
查看译文
关键词
3D human pose,Self-supervised learning,Multi-view geometry
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要