Fusing Skeletons and Texts Based on GCN-CNN for Action Recognition.

Lijuan Zhou, Zhihuan Liu, Weicong Zhang, Di Fan,Xiaojie Qian

ICDIP '23: Proceedings of the 15th International Conference on Digital Image Processing(2023)

引用 0|浏览1
暂无评分
摘要
This paper proposes a novel semantic action recognition method by fusing textual descriptions. To obtain detailed semantic information, this paper applies an action captioning method to generate textual descriptions of actions. This paper builds a fusion network for action recognition with the generated descriptions. The fusion network could capture the spatial and temporal relations of skeleton sequences and the semantics of descriptions. The fusion network includes a spatial module to obtain the spatial relations of joints based on the graph convolutional network, a textual module to extract textual features, and a temporal module to extract the temporal relations of fused features based on a convolutional network. The proposed method in this paper makes full use of detailed semantics for classification. Experiments were conducted on WorkoutUOW-18 and MSRC-12 datasets and the proposed method achieved better performance than using visual data only, and other semantic action recognition methods. The improvement of the proposed method demonstrates the effectiveness of fusing textual descriptions for action recognition.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要