Visual feature segmentation with reinforcement learning for continuous sign language recognition

Yuchun Fang, Liangjun Wang, Shiquan Lin,Lan Ni

International Journal of Multimedia Information Retrieval(2023)

引用 0|浏览4
暂无评分
摘要
Continuous sign language recognition (CSLR) involves inputting a video that contains unbroken signs and outputting a prediction of the sign gloss sequence. Our research found that the visual features extracted from different signs in a sign language video show a noticeable disparity. As a result, we employed reinforcement learning (RL) to segment the visual features of the video into multiple groups to aid in model training. Compared to previous CSLR methods, our approach results in a more fine-tuned and supervised training process, leading to greater effective gradient backpropagation and improved model performance. We introduce a novel method named “Visual Feature Segmentation with Reinforcement Learning (VFS-RL)” for CSLR. Firstly, we constructed an end-to-end continuous sign language recognition network. Subsequently, we designed an auxiliary task of multi-class recognition to improve the model’s capability for extracting semantic information from sign video, which uses RL to group the video’s visual features. Finally, we conducted experiments on two public CSLR datasets, and the results of our ablation studies demonstrate the effectiveness of our proposed method. Our approach has competitive results compared to other methods in comparison tests.
更多
查看译文
关键词
Video recognition,Sign language recognition,Reinforcement learning,Deep learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要