Sign Language Recognition via Deformable 3D Convolutions and Modulated Graph Convolutional Networks

Katerina Papadimitriou,Gerasimos Potamianos

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)（2023）

引用 0|浏览1

暂无评分

摘要

Automatic sign language recognition (SLR) remains challenging, especially when employing RGB video alone (i.e., with no depth or special glove-based input) and under a signer-independent (SI) framework, due to inter-personal signing variation. In this paper, we address SI isolated SLR from RGB video, proposing an innovative deep-learning framework that leverages multi-modal appearanceand skeleton-based information. Specifically, we propose three components for the first time in SLR: (i) a modified version of the ResNet2+1D network to capture signing appearance information, where spatial and temporal convolutions are substituted by their deformable counterparts, accomplishing both prevalent spatial modeling potential and motion-aware modeling adaptability; (ii) a novel spatio-temporal graph convolutional network (ST-GCN) that integrates a GCN variant, involving weight and affinity modulation for modeling diverse correlations between different body joints beyond the physical human skeleton structure, followed by a self-attention layer and a temporal convolution; and (iii) the “PIXIE” 3D human pose and shape regressor to generate 3D joint-rotation parameterization used for ST-GCN graph construction. Both appearance- and skeleton-based streams are ensembled in the proposed system and evaluated on two datasets of isolated signs, one in Turkish and one in Greek. Our system outperforms the state-of-the-art on the second set, yielding 53% relative error rate reduction (2.45% absolute), while it performs on par with the best reported system on the first.

查看译文

关键词

SI isolated sign language recognition,deformable 3D-CNN,ST-GCN,modulated GCN,"PIXIE"

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要