ASTA-Net: Adaptive Spatio-Temporal Attention Network for Person Re-Identification in Videos

MM '20: The 28th ACM International Conference on Multimedia Seattle WA USA October, 2020(2020)

引用 10|浏览116
暂无评分
摘要
The attention mechanism has been widely applied to enhance pedestrian representation for person re-identification in videos. However, most existing methods learn the spatial and temporal attention separately, and thus ignore the correlation between them. In this work, we propose a novel Adaptive Spatio-Temporal Attention Network (ASTA-Net) to adaptively aggregate the spatial and temporal attention features into discriminative pedestrian representation for person re-identification in videos. Specifically, multiple Adaptive Spatio-Temporal Fusion modules within ASTA-Net are designed for exploring precise spatio-temporal attention on multi-level feature maps. They first obtain the preliminary spatial and temporal attention features via the spatial semantic relations for each frame and temporal dependencies among inconsecutive frames, then adaptively aggregate the preliminary attention features on the basis of their correlation. Moreover, an Adjacent-Frame Motion module is designed to explicitly extract motion patterns according to the feature-level variation among adjacent frames. Extensive experiments on the three widely-used datasets, i.e., MARS, iLIDS-VID and PRID2011, have demonstrated the effectiveness of the proposed approach.
更多
查看译文
关键词
Person re-identification, attention mechanism, spatio-temporal, motion pattern
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要