End-to-end 3D Human Pose Estimation using Dual Decoders

sciencepaper_online(2024)

引用 0|浏览0
暂无评分
摘要
Abstract: Existing methods for 3D human pose estimation mainly divide the task into two stages. The first stage identifies the 2D coordinates of the human joints in the input image, namely the 2D human joint coordinates. The second stage uses the results from the first stage as input to recover the depth information of human joints from the 2D human joint coordinates to achieve 3D human pose estimation. However, the recognition accuracy of the two-stage method relies heavily on the results of the first stage and includes too many redundant processing steps, which reduces the inference efficiency of the network. To address these issues, we propose the EDD, a fully End-to-end 3D human pose estimation method based on transformer architecture with Dual Decoders. By learning multiple human poses, the model can directly infer all 3D human poses in the image using a pose decoder, and then further optimize the recognition result using a joint decoder based on the kinematic relations between joints. With the attention mechanism, this method can adaptively focus on the most relevant features to the target joint, effectively overcoming the feature misalignment problem in the human pose estimation task and greatly improving the model performance. Any complex post-processing step, such as non-maximum suppression, is eliminated, further improving the efficiency of the model. The results show that this method achieves an accuracy of 87.4\% on the MuPoTS-3D dataset, significantly improving the accuracy of the end-to-end 3D human pose estimation method based on mixed training.
更多
查看译文
关键词
Artificial intelligence,Computer vision,3D human pose estimation,Transformer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要