Deep learning models beyond temporal frame-wise features for hand gesture video recognition

The Journal of Supercomputing（2024）

引用 0|浏览0

暂无评分

摘要

Recurrent neural networks (RNNs) are widely utilized in neural network research to capture spatiotemporal features in video data. However, their effectiveness heavily relies on the spatial features upon which they trained. This paper introduces innovative ensembles of features for constructing frame-wise structures by employing impactful neural network models with innovative training pipelines. These features are designed to enhance the recognition of hand gesture videos using RNN by leveraging temporal information. Recognizing hand gestures from videos is a complex task that presents considerable challenges. One notable challenge is the overlap in gesture motion, where different gesture categories exhibit similar hand poses within a single video clip. To overcome this issue, we were motivated to develop extensive and diverse features that offer a more comprehensive description of the gesture video clips, thereby mitigating recognition problems caused by images overlapping. Overall, our efforts to generate diverse features have yielded promising results in enhancing the recognition of hand gestures from videos, particularly in scenarios where overlap poses a significant challenge. We have combined the extracted features from a deep neural network trained from scratch with features obtained from various standard neural networks (Self-Organizing Map, Radial Base Function) that are trained to enhance the deep-trained features. The mutual arrangement for combining the shared features has configured new frame-wise image features. Furthermore, we have provided a performance comparison of the newly constructed frame-wise features through time-sharing to train RNN for recognition. The proposed models have been evaluated on two-hand gesture video datasets, where a preserving gesture sequence is crucial due to overlapping motions. Our work demonstrates a significant improvement in performance for both datasets.

查看译文

关键词

Frame-wise image features,Video gesture recognition,Recurrent neural networks,Deep neural network,Self organizing map,Radial base function

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要