Parallel Self-Attention and Spatial-Attention Fusion for Human Pose Estimation and Running Movement Recognition

IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS(2024)

引用 0|浏览1
暂无评分
摘要
Human pose estimation (HPE) is a fundamental yet promising visual recognition problem. Existing popular methods (e.g., Hourglass and its variants) either attempt to directly add local features element-wisely, or (e.g., vision transformers) try to learn the global relationships among different human parts. However, it remains an open problem to effectively integrate the local-global representations for accurate HPE. In this work, we design four feature fusion strategies on the hierarchical ResNet structure, including direct channel concatenation, element-wise addition, and two parallel structures. Both two parallel structures adopt the naive self-attention encoder to model global dependencies. The difference between them is that one adopts the original ResNet BottleNeck while the other employs a spatial-attention module (named SSF) to learn the local patterns. Experiments on COCO Keypoint 2017 show that our SSF for HPE (named SSPose) achieves the best average precision with acceptable computational cost among the compared state-of-the-art methods. In addition, we build a lightweight running data set to verify the effectiveness of SSPose. Based solely on the keypoints estimated by our SSPose, we propose a regression model to identify valid running movements without training any other classifiers. Our source codes and running data set are publicly available.
更多
查看译文
关键词
Transformers,Semantics,Pose estimation,Feature extraction,Convolutional neural networks,Task analysis,Visualization,Feature fusion,human pose estimation (HPE),running recognition,self-attention,spatial attention
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要