MAWKDN: A Multimodal Fusion Wavelet Knowledge Distillation Approach Based on Cross-View Attention for Action Recognition

IEEE Transactions on Circuits and Systems for Video Technology(2023)

引用 1|浏览28
暂无评分
摘要
The recognition performance of existing vision-based human action recognition (HAR) methods is greatly reduced in the case of low camera resolution or occlusion. Wearable sensors can provide complementary information to alleviate this problem. It is challenging to construct a robust HAR model using multimodal wearable-sensor data. In this paper, we propose a cross-Attention-based Multimodal fusion Wavelet Knowledge Distillation Network (MAWKDN) method to guide recognition from video data by acquiring complementary information from wearable sensors and reduce the noise effects through wavelet knowledge distillation, which improves the robustness of the model. A multi-attention dilated convolution kernel residual network including dilated convolution and an attention mechanism is constructed to extract features from various sensor modalities and fuse the various modal data through the cross-view attention method to acquire additional information from different modalities. To reduce the modal differences between different modalities of the teacher and student networks and acquire similar semantic knowledge, we learn the information between different modalities by constructing a graph structure of convolutional layer features, and computing the semantic preservation loss between the teacher and student networks. To reduce the influence of noise in the input data, we construct the loss of wavelet knowledge distillation, which transforms the image through the discrete wavelet transform and only retains the low frequency features to extract the useful information. The top-1 accuracy achieved on the UTD-MHAD (99.31%), Berkeley-MHAD (99.40%) and the F1-score on the MMAct (85.26% based on cross-session) dataset prove the superior performance of MAWKDN compared with the state-of-the-art HAR methods. Moreover, we demonstrate the robustness of the MAWKDN approach on the noise-added UTD-MHAD dataset.
更多
查看译文
关键词
Human action recognition,multimodal,wavelet knowledge distillation,wearable sensor,attention mechanism
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要