An efficient and accurate 2D human pose estimation method using VTTransPose network

Rui Li,Qi Li,Duo He, Xin Zeng, Yan An

Research Square (Research Square)(2023)

引用 0|浏览0
暂无评分
摘要
Abstract Human pose estimation is an important research direction in the field of computer vision, and transformer-based pose estimation algorithms have been favored for their excellent performance and low parametric number. Nonetheless, the algorithms suffer from computational complexity and insensitivity to local details. To address these problems, the transpose model introduces the twin attention module to improve the model efficiency and reduce resource consumption. Additionally, to solve the drawback of insufficiently high-quality joint feature representation resulting in poor network recognition, the intra-level feature fusion module V block was used to replace the basic block in the third subnet of the CNN backbone in the TransPose model. Then, the improved TransPose pose estimation network named VTTransPose was set up. The VTTranspose network achieves AP evaluation index scores of 76.5 and 73.6 on COCO val2017 and COCO test-dev2017, which shows an improvement of 0.4 and 0.2 compared to the original TransPose network. Moreover, the FLOPs of VTTransPose are reduced by 4.8G, the number of parameters is decreased by 2M, and the memory usage during training is reduced by about 40%. All the experimental results demonstrate that the proposed VTTransPose is more accurate, efficient, and lightweight compared with the original TransPose model.
更多
查看译文
关键词
accurate 2d human
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要