Cascading spatio-temporal attention network for real-time action detection

MACHINE VISION AND APPLICATIONS(2023)

引用 0|浏览7
暂无评分
摘要
Accurately detecting human actions in video has many applications, such as video surveillance and somatosensory games. In this paper, we propose a spatial-aware attention module (SAM) and a temporal-aware attention module (TAM) for spatio-temporal action detection in videos. SAM first concatenates the feature maps of consecutive frames on the channel and then uses dilated convolutional layer followed by a sigmoid function to generate a spatial attention map. The resulting attention map contains spatial information from consecutive frames, so it helps the detector focus on salient spatial features to achieve more accurate localization of action instances in consecutive frames. TAM deploys several fully connected layers to generate a temporal attention map. The temporal attention map focuses on the temporal association of each spatial feature; it can capture the temporal association of action instances, thereby improving the detector to track actions. To evaluate the effectiveness of SAM and TAM, we build an efficient and strong anchor-free action detector, cascading spatio-temporal attention network, equipped with a 2D backbone and SAM and TAM modules. Extensive experiments on two benchmarks, JHMDB and UCF101-24, demonstrate the preferable performance of SAM and TAM.
更多
查看译文
关键词
Spatio-temporal action detection,Human behavior analysis,Spatio-temporal attention
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要