Multi-region CNN-Transformer for Micro-gesture Recognition in Face and Upper Body.

ACM Multimedia Asia(2023)

引用 0|浏览3
暂无评分
摘要
This paper presents a novel task that recognizes from a video unintentional micro-gestures (UMGs), which are movements made by people unconsciously and unintentionally. Recognizing UMGs is crucial because they reveal a person’s underlying psychological state. Since a UMG is composed of subtle sequential movements, the recognition model must be able to capture accurate information in both the spatial and temporal directions. Therefore, we utilize a convolutional neural network (CNN) to capture information in the spatial direction and a Transformer to merge the features extracted by the CNN in the temporal direction. However, this model often misrecognizes UMGs because it is not possible to capture slight differences in movements, such as in the face and mouth regions. To address this issue, we propose a novel model for UMG recognition, the Multi-Region CNN-Transformer model, that inputs cropped videos from multiple upper body regions simultaneously. The key advance of our method is to capture subtle changes in regions such as the upper body, face, head, and mouth for recognizing UMGs. We demonstrate the effectiveness of the proposed method through experiments using our newly created UMG dataset for this task.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要