HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud
CVPR 2024(2024)
摘要
Extracting keypoint locations from input hand frames, known as 3D hand pose
estimation, is a critical task in various human-computer interaction
applications. Essentially, the 3D hand pose estimation can be regarded as a 3D
point subset generative problem conditioned on input frames. Thanks to the
recent significant progress on diffusion-based generative models, hand pose
estimation can also benefit from the diffusion model to estimate keypoint
locations with high quality. However, directly deploying the existing diffusion
models to solve hand pose estimation is non-trivial, since they cannot achieve
the complex permutation mapping and precise localization. Based on this
motivation, this paper proposes HandDiff, a diffusion-based hand pose
estimation model that iteratively denoises accurate hand pose conditioned on
hand-shaped image-point clouds. In order to recover keypoint permutation and
accurate location, we further introduce joint-wise condition and local detail
condition. Experimental results demonstrate that the proposed HandDiff
significantly outperforms the existing approaches on four challenging hand pose
benchmark datasets. Codes and pre-trained models are publicly available at
https://github.com/cwc1260/HandDiff.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要