Combinational sign language recognition

Computer Vision and Image Understanding(2024)

引用 0|浏览0
暂无评分
摘要
Traditional Sign Language Recognition (SLR) suffers from the scale limitation of SL datasets, which may lead to over-fitting in narrow context and application. In this paper, to solve the problem, we for the first time propose a Combinational Sign Language Recognition (CombSLR) framework, which can serve as an augmentation to extend existing datasets by combining continuous videos (called Template) and isolated videos (called Entity). The CombSLR framework is trained on combinational SL data (T & E) and applied on continuous SL data. However, due to the unknown combination location and context inconsistency between any T-E pair, naively inserting E into T is infeasible. To tackle this issue, we propose a simple yet effective method named EinT, which contains two main modules: (1) Location Candidate Prediction, to produce a reliable insertion location considering the inter-frame relationship and make the network end-to-end trainable; (2) Feature Insertion via Context Passing, to eliminate context inconsistency between T and E feature. EinT can be easily compatible with the existing SLR models to effectively implement data augmentation at the feature level during training stage. We conduct extensive experiments on multiple publicly available sign language datasets, e.g., CCLS, CSL+DEVISIGN-D and CSL-Daily+DEVISIGN-D. The experimental results show the CombSLR can significantly promote existing SLR methods, e.g., averagely improving by 15.1% on CCLS dataset and 6.4% on CSL dataset for WER metric, which demonstrates the superiority of CombSLR framework.
更多
查看译文
关键词
Sign language recognition (SLR),Combinational learning,Location prediction,Feature insertion,Context passing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要