STRIDE: Single-video based Temporally Continuous Occlusion Robust 3D Pose Estimation
arxiv(2023)
摘要
The capability to accurately estimate 3D human poses is crucial for diverse
fields such as action recognition, gait recognition, and virtual/augmented
reality. However, a persistent and significant challenge within this field is
the accurate prediction of human poses under conditions of severe occlusion.
Traditional image-based estimators struggle with heavy occlusions due to a lack
of temporal context, resulting in inconsistent predictions. While video-based
models benefit from processing temporal data, they encounter limitations when
faced with prolonged occlusions that extend over multiple frames. This
challenge arises because these models struggle to generalize beyond their
training datasets, and the variety of occlusions is hard to capture in the
training data. Addressing these challenges, we propose STRIDE (Single-video
based TempoRally contInuous occlusion Robust 3D Pose Estimation), a novel
Test-Time Training (TTT) approach to fit a human motion prior for each video.
This approach specifically handles occlusions that were not encountered during
the model's training. By employing STRIDE, we can refine a sequence of noisy
initial pose estimates into accurate, temporally coherent poses during test
time, effectively overcoming the limitations of prior methods. Our framework
demonstrates flexibility by being model-agnostic, allowing us to use any
off-the-shelf 3D pose estimation method for improving robustness and temporal
consistency. We validate STRIDE's efficacy through comprehensive experiments on
challenging datasets like Occluded Human3.6M, Human3.6M, and OCMotion, where it
not only outperforms existing single-image and video-based pose estimation
models but also showcases superior handling of substantial occlusions,
achieving fast, robust, accurate, and temporally consistent 3D pose estimates.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要