3D Human Pose Perception from Egocentric Stereo Videos
CoRR(2023)
摘要
While head-mounted devices are becoming more compact, they provide egocentric
views with significant self-occlusions of the device user. Hence, existing
methods often fail to accurately estimate complex 3D poses from egocentric
views. In this work, we propose a new transformer-based framework to improve
egocentric stereo 3D human pose estimation, which leverages the scene
information and temporal context of egocentric stereo videos. Specifically, we
utilize 1) depth features from our 3D scene reconstruction module with
uniformly sampled windows of egocentric stereo frames, and 2) human joint
queries enhanced by temporal features of the video inputs. Our method is able
to accurately estimate human poses even in challenging scenarios, such as
crouching and sitting. Furthermore, we introduce two new benchmark datasets,
i.e., UnrealEgo2 and UnrealEgo-RW (RealWorld). The proposed datasets offer a
much larger number of egocentric stereo views with a wider variety of human
motions than the existing datasets, allowing comprehensive evaluation of
existing and upcoming methods. Our extensive experiments show that the proposed
approach significantly outperforms previous methods. We will release
UnrealEgo2, UnrealEgo-RW, and trained models on our project page.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要