Domain-Guided Spatio-Temporal Self-Attention for Egocentric 3D Pose Estimation.

KDD(2023)

引用 0|浏览32
暂无评分
摘要
Vision-based ego-centric 3D human pose estimation (ego-HPE) is essential to support critical applications of xR-technologies. However, severe self-occlusions and strong distortion introduced by the fish-eye view from the head mounted camera, make ego-HPE extremely challenging. To address these challenges, we propose a domain-guided spatio-temporal transformer model that leverages information specific to ego-views. Powered by this domain-guided transformer, we build Egocentric Spatio-Temporal Self-Attention Network (Ego-STAN), which uses 2D image representations and spatio-temporal attention to address both distortions and self-occlusions in ego-HPE. Additionally, we introduce a spatial concept called feature map tokens (FMT) which endows Ego-STAN with the ability to draw complex spatio-temporal information encoded in egocentric videos. Our quantitative evaluation on the contemporary xR-EgoPose dataset, achieves a 38.2% improvement on the highest error joints against the SOTA ego-HPE model, while accomplishing a 22% decrease in the number of parameters. Finally, we also demonstrate the generalization capabilities of our model to real-world HPE tasks beyond ego-views achieving 7.7% improvement on 2D human pose estimation with the Human3.6M dataset. Our code is also made available at: https://github.com/jmpark0808/Ego-STAN.
更多
查看译文
关键词
Egocentric pose estimation,Spatio-temporal models,Transformers,Self-occlusions in videos,Fish-eye distortion
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要