DANet: dual association network for human pose estimation in video

MULTIMEDIA TOOLS AND APPLICATIONS(2023)

引用 0|浏览0
暂无评分
摘要
Human pose estimation (HPE) is a critical problem in computer vision, serving as a foundation for many downstream tasks. However, existing image-based methods tend to perform poorly when applied to video sequences, especially in complex scenes with motion blur and serious occlusion. Therefore, it is essential to develop a specialized pose estimation network for video. In this paper, we propose a human pose estimation network called the Dual Association Network (DANet), designed explicitly for video sequences. It can make full use of the temporal information between video frames and the correlation between joints. The overall framework consists of three modules. The Dual Fusion Network (DFN) utilizes temporal information from adjacent frames to compute position offsets and infer the positions of blurred joints in the current frame. The Joint Association Network (JAN) models the correlation between joints and infers invisible joints based on visible joints. The SpatioTemporal Fusion (STF) module applies deformable convolutions to fuse the outputs from DFN and JAN and refine the final prediction. The application of the three modules resulted in a 1.4 AP improvement in ankle joint detection, particularly in cases where the joint is occluded or blurred due to motion. Our method demonstrated competitive results on two large benchmark datasets, PoseTrack2017 and PoseTrack2018.
更多
查看译文
关键词
Human pose estimation in video,Dual fusion network,Joint association network,Spatiotemporal fusion module
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要