Discriminative Spatial Feature Learning for Person Re-Identification

MM '20: The 28th ACM International Conference on Multimedia Seattle WA USA October, 2020(2020)

引用 7|浏览63
暂无评分
摘要
Person re-identification (ReID) aims to match detected pedestrian images from multiple non-overlapping cameras. Most existing methods employ a backbone CNN to extract a vectorized feature representation by performing some global pooling operations (such as global average pooling and global max pooling) on the 3D feature map (i.e., the output of the backbone CNN). Although simple and effective in some situations, the global pooling operation only focuses on the statistical properties and ignores the spatial distribution of the feature map. Hence, it can not distinguish two feature maps when they have similar response values located in totally different positions. To handle this challenge, a novel method is proposed to learn the discriminative spatial features. Firstly, a self-constrained spatial transformer network (SC-STN) is introduced to handle the misalignments caused by detection errors. Then, based on the prior knowledge that the spatial structure of a pedestrian often keeps robust in vertical orientation of images, a novel vertical convolution network (VCN) is proposed to extract the spatial feature in vertical. Extensive experimental evaluations on several benchmarks demonstrate that the proposed method achieves state-of-the-art performances by introducing only a few parameters to the backbone.
更多
查看译文
关键词
Person ReID, Spatial Feature, Self-Constrained Spatial Transformer Network, Vertical Convolution Network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要