Tracklet Self-Supervised Learning for Unsupervised Person Re-Identification

AAAI, pp. 12362-12369, 2020.

Cited by: 3|Bibtex|Views98|Links
EI
Keywords:
source domainCumulative Matching Characteristicperson reidentificationtracklet selfin real-world applications.More(10+)
Weibo:
We presented a Tracklet Self-Supervised Learning method for unsupervised image and video person re-id

Abstract:

Existing unsupervised person re-identification (re-id) methods mainly focus on cross-domain adaptation or one-shot learning. Although they are more scalable than the supervised learning counterparts, relying on a relevant labelled source domain or one labelled tracklet per person initialisation still restricts their scalability in real-wo...More

Code:

Data:

0
Introduction
Highlights
  • The key in person re-identification is learning a discriminative feature representation model (Liu et al 2019b; Zhang et al 2019b; Tesfaye et al 2019; Dong, Gong, and Zhu 2019)
  • We propose tracklet self-supervised learning (TSSL) to optimise a feature embedding space for both video and image unsupervised re-id
  • The contributions of this work are: (I) We propose an idea of tracklet self-supervised learning for unsupervised person re-identification
  • To fully mine the tracklet structural information for pure unsupervised person re-id, we propose a novel tracklet selfsupervised learning (TSSL) method
  • The performance of TSSL increases to 71.2% in rank-1 accuracy and 43.3% in mean Average Precision (mAP) from λ = 0 to λ = 0.1, and gradually decreases from λ = 0.1 to λ = 0.3
  • We presented a Tracklet Self-Supervised Learning (TSSL) method for unsupervised image and video person re-id
Methods
Results
  • Evaluation on Video Benchmarks

    From Table 2, the authors observed similar performance comparisons. (1) On MARS, TSSL is the second best model in the two pure unsupervised learning groups.
  • (2) On DukeMTMCVideoReID, TSSL achieves the best mAP (64.6%) and rank1 accuracy (73.9%), consistently outperforming all unsupervised learning competitors.
  • Overall, these comparisons have comprehensively validated the performance of TSSL.
  • The authors have several observations: (1) With the Lc alone, the model already achieves fairly strong performance – 35.1% in mAP and 65.8% in rank-1
  • This verifies the efficacy of the proposed global tracklet cluster structure mining.
  • This verifies the efficacy of the proposed global tracklet cluster structure mining. (2) The addition of mAP
Conclusion
  • The authors presented a Tracklet Self-Supervised Learning (TSSL) method for unsupervised image and video person re-id.
  • Doing so allows to maximise the scalability and usability of TSSL in arbitrarily unconstrained domains
  • This eliminates the expensive cross-camera person identity annotation as required by conventional supervised learning methods, and the source domain supervision as required by unsupervised cross-domain adaptation methods, and the camera view prior knowledge as required by existing unsupervised tracklet association methods.
Summary
  • Introduction:

    The key in person re-identification is learning a discriminative feature representation model (Liu et al 2019b; Zhang et al 2019b; Tesfaye et al 2019; Dong, Gong, and Zhu 2019).
  • While existing supervised learning based reid methods have advanced significantly (Fu et al 2019; Wu, Zhu, and Gong 2019b), they fundamentally suffer from an unrealistic assumption of requiring a large set of crosscamera labelled training data (Yu et al 2019; Li, Zhu, and Gong 2018a).
  • Recent studies have shifted to capitalise abundant unlabelled data for unsupervised model optimisation (Lin et al 2019; Zhang et al 2019a).
  • Unlabelled tracklets Camera 1 Camera 2 Model.
  • Methods:

    TJAIDL† (Wang et al 2018) SPGAN† (Deng et al 2018) PTGAN† (Wei et al 2018) HHL† (Zhong et al 2018) PAUL† (Yang et al 2019) ATNet† (Liu et al 2019a) DGM (Ye et al 2017) Stepwise (Liu, Wang, and Lu 2017) RACE (Ye, Lan, and Yuen 2018) EUG (Wu et al 2018a) TAUDL (Li, Zhu, and Gong 2018a) DAL (Chen, Zhu, and Gong 2018) OIM (Xiao et al 2017) BUC (Lin et al 2019) TSSL Ref.
  • Cross-domain: Large source domain person ID label.
  • One-shot: One-shot ID label per person.
  • Pure unsupervised: No label Market mAP R1 Duke mAP R1 MARS mAP R1
  • Results:

    Evaluation on Video Benchmarks

    From Table 2, the authors observed similar performance comparisons. (1) On MARS, TSSL is the second best model in the two pure unsupervised learning groups.
  • (2) On DukeMTMCVideoReID, TSSL achieves the best mAP (64.6%) and rank1 accuracy (73.9%), consistently outperforming all unsupervised learning competitors.
  • Overall, these comparisons have comprehensively validated the performance of TSSL.
  • The authors have several observations: (1) With the Lc alone, the model already achieves fairly strong performance – 35.1% in mAP and 65.8% in rank-1
  • This verifies the efficacy of the proposed global tracklet cluster structure mining.
  • This verifies the efficacy of the proposed global tracklet cluster structure mining. (2) The addition of mAP
  • Conclusion:

    The authors presented a Tracklet Self-Supervised Learning (TSSL) method for unsupervised image and video person re-id.
  • Doing so allows to maximise the scalability and usability of TSSL in arbitrarily unconstrained domains
  • This eliminates the expensive cross-camera person identity annotation as required by conventional supervised learning methods, and the source domain supervision as required by unsupervised cross-domain adaptation methods, and the camera view prior knowledge as required by existing unsupervised tracklet association methods.
Tables
  • Table1: The evaluation setting statistics. Market-1501, DukeMTMC-ReID and DukeMTMC-VideoReID are abbreviated as Market, Duke, and DukeVideo, respectively
  • Table2: Comparisons with the state-of-the-art person re-id methods on Market-1501, DukeMTMC-ReID, Mars and DukeMTMC-VideoReID. The best results are in bold. †: Unsupervised cross-domain setting, Market (source) ⇒ Duke (target)
  • Table3: Evaluating the self-supervised learning components of TSST on Market-1501. Lf : Tracklet frame coherence learning; Lc: Tracklet cluster structure learning; Ln: Tracklet neighbourhood compactness learning
Download tables as Excel
Related work
  • Most existing person re-id methods are based on supervised learning, which require labelled pairs of person images for training (Li, Zhu, and Gong 2018b; Wu, Zhu, and Gong 2019a), leading to limited scalability in deployment. In contrast, unsupervised re-id is capable of learning from unlabelled data without exhaustive manual annotation, allowing to leverage massive available unlabelled data. In this section, we mainly review and discuss unsupervised person re-id.

    Unsupervised Cross-Domain Person Re-ID

    Transfer learning is one of the most important strategies for addressing unsupervised re-id, i.e. unsupervised crossdomain person re-id. Existing methods typically pre-train a model in source domains with rich labelled training data, and then transfer this model to an unlabelled target domain (Yu et al 2019; Liu et al 2019a; Zhong et al 2018; 2019). In (Yu et al 2019), Yu et al propose using a set of labelled persons from a source domain as the references to facilitate soft multi-label estimation for unlabelled persons in a target domain. In (Yang et al 2019), Yang et al introduce a patch-based model to learn discriminative features. They pre-train this model in a large-scale labelled source dataset before fine-tuning it in an unlabelled target dataset based on patch-level and image-level learning constraints. In (Wang et al 2018), Wang et al transfer both identity and attribute information from a labelled source domain to an unlabelled target domain. This is achieved by extracting attribute-semantic and identity-discriminative feature representations. Different from these methods, we focus on pure unsupervised re-id, where no prior knowledge is available from any labelled source domain. This is for further scaling the learning algorithm to arbitrarily unconstrained and unlabelled domains, without the need for selecting relevant source domains.
Funding
  • This work is supported by Queen Mary University of London Principal’s Scholarship, Vision Semantics Limited, Alan Turing Institute Turing Fellowship, and Innovate UK Industrial Challenge Project on Developing and Commercialising Intelligent Video Analytics Solutions for Public Safety (98111-571149)
Reference
  • Caron, M.; Bojanowski, P.; Joulin, A.; and Douze, M. 2018. Deep clustering for unsupervised learning of visual features. In ECCV.
    Google ScholarFindings
  • Chen, Y.; Zhu, X.; and Gong, S. 2018. Deep association learning for unsupervised video person re-identification. In BMVC.
    Google ScholarFindings
  • Deng, W.; Zheng, L.; Ye, Q.; Kang, G.; Yang, Y.; and Jiao, J. 2018. Image-image domain adaptation with preserved self-similarity and domain-dissimilarity for person re-identification. In CVPR.
    Google ScholarFindings
  • Dong, Q.; Gong, S.; and Zhu, X. 2019. Person search by text attribute query as zero-shot learning. In ICCV.
    Google ScholarFindings
  • Fu, Y.; Wei, Y.; Zhou, Y.; Shi, H.; Huang, G.; Wang, X.; Yao, Z.; and Huang, T. 2019. Horizontal pyramid matching for person re-identification. In AAAI.
    Google ScholarFindings
  • He, K.; Zhang, X.; Ren, S.; and Sun, J. 201Deep residual learning for image recognition. In CVPR.
    Google ScholarFindings
  • Hermans, A.; Beyer, L.; and Leibe, B. 201In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737.
    Findings
  • Hinton, G.; Vinyals, O.; and Dean, J. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531.
    Findings
  • Huang, J.; Dong, Q.; Gong, S.; and Zhu, X. 201Unsupervised deep learning by neighbourhood discovery. In ICML.
    Google ScholarFindings
  • Li, M.; Zhu, X.; and Gong, S. 2018a. Unsupervised person re-identification by deep learning tracklet association. In ECCV.
    Google ScholarLocate open access versionFindings
  • Li, W.; Zhu, X.; and Gong, S. 2018b. Harmonious attention network for person re-identification. In CVPR.
    Google ScholarFindings
  • Liao, S.; Hu, Y.; Zhu, X.; and Li, S. Z. 2015. Person reidentification by local maximal occurrence representation and metric learning. In CVPR.
    Google ScholarFindings
  • Lin, Y.; Dong, X.; Zheng, L.; Yan, Y.; and Yang, Y. 2019. A bottom-up clustering approach to unsupervised person reidentification. In AAAI.
    Google ScholarFindings
  • Liu, J.; Zha, Z.-J.; Chen, D.; Hong, R.; and Wang, M. 2019a. Adaptive transfer network for cross-domain person re-identification. In CVPR.
    Google ScholarFindings
  • Liu, Y.; Yuan, Z.; Zhou, W.; and Li, H. 2019b. Spatial and temporal mutual promotion for video-based person reidentification. In AAAI.
    Google ScholarFindings
  • Liu, Z.; Wang, D.; and Lu, H. 2017. Stepwise metric promotion for unsupervised video person re-identification. In ICCV.
    Google ScholarFindings
  • Ristani, E.; Solera, F.; Zou, R.; Cucchiara, R.; and Tomasi, C. 2016. Performance measures and a data set for multitarget, multi-camera tracking. In ECCV.
    Google ScholarFindings
  • Tesfaye, Y. T.; Zemene, E.; Prati, A.; Pelillo, M.; and Shah, M. 2019. Multi-target tracking in multiple non-overlapping cameras using fast-constrained dominant sets. IJCV.
    Google ScholarLocate open access versionFindings
  • Wang, F.; Xiang, X.; Cheng, J.; and Yuille, A. L. 2017. Normface: l 2 hypersphere embedding for face verification. In ACMMM.
    Google ScholarFindings
  • Wang, J.; Zhu, X.; Gong, S.; and Li, W. 2018. Transferable joint attribute-identity deep learning for unsupervised person re-identification. In CVPR.
    Google ScholarFindings
  • Wei, L.; Zhang, S.; Gao, W.; and Tian, Q. 2018. Person transfer gan to bridge domain gap for person reidentification. In CVPR.
    Google ScholarFindings
  • Wu, Y.; Lin, Y.; Dong, X.; Yan, Y.; Ouyang, W.; and Yang, Y. 2018a. Exploit the unknown gradually: One-shot videobased person re-identification by stepwise learning. In CVPR.
    Google ScholarFindings
  • Wu, Z.; Xiong, Y.; Yu, S. X.; and Lin, D. 2018b. Unsupervised feature learning via non-parametric instance discrimination. In CVPR.
    Google ScholarFindings
  • Wu, G.; Zhu, X.; and Gong, S. 2019a. Person reidentification by ranking ensemble representations. In ICIP.
    Google ScholarFindings
  • Wu, G.; Zhu, X.; and Gong, S. 2019b. Spatio-temporal associative representation for video person re-identification. In BMVC.
    Google ScholarFindings
  • Xiao, T.; Li, S.; Wang, B.; Lin, L.; and Wang, X. 2017. Joint detection and identification feature learning for person search. In CVPR.
    Google ScholarFindings
  • Yang, Q.; Yu, H.-X.; Wu, A.; and Zheng, W.-S. 2019. Patchbased discriminative feature learning for unsupervised person re-identification. In CVPR.
    Google ScholarFindings
  • Yang, J.; Parikh, D.; and Batra, D. 2016. Joint unsupervised learning of deep representations and image clusters. In CVPR.
    Google ScholarFindings
  • Ye, M.; Ma, A. J.; Zheng, L.; Li, J.; and Yuen, P. C. 2017. Dynamic label graph matching for unsupervised video reidentification. In ICCV.
    Google ScholarFindings
  • Ye, M.; Zhang, X.; Yuen, P. C.; and Chang, S.-F. 2019. Unsupervised embedding learning via invariant and spreading instance feature. In CVPR.
    Google ScholarFindings
  • Ye, M.; Lan, X.; and Yuen, P. C. 2018. Robust anchor embedding for unsupervised video person re-identification in the wild. In ECCV.
    Google ScholarLocate open access versionFindings
  • Yu, H.-X.; Zheng, W.-S.; Wu, A.; Guo, X.; Gong, S.; and Lai, J.-H. 2019. Unsupervised person re-identification by soft multilabel learning. In CVPR.
    Google ScholarFindings
  • Zhang, X.; Cao, J.; Shen, C.; and You, M. 2019a. Selftraining with progressive augmentation for unsupervised cross-domain person re-identification. In ICCV.
    Google ScholarFindings
  • Zhang, Y.; Zhong, Q.; Ma, L.; Xie, D.; and Pu, S. 2019b. Learning incremental triplet margin for person reidentification. In AAAI.
    Google ScholarFindings
  • Zheng, L.; Shen, L.; Tian, L.; Wang, S.; Wang, J.; and Tian, Q. 2015. Scalable person re-identification: A benchmark. In ICCV.
    Google ScholarFindings
  • Zheng, L.; Bie, Z.; Sun, Y.; Wang, J.; Su, C.; Wang, S.; and Tian, Q. 2016. MARS: A video benchmark for large-scale person re-identification. In ECCV.
    Google ScholarFindings
  • Zheng, Z.; Zheng, L.; and Yang, Y. 2017. Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In ICCV.
    Google ScholarFindings
  • Zhong, Z.; Zheng, L.; Li, S.; and Yang, Y. 2018. Generalizing a person retrieval model hetero-and homogeneously. In ECCV.
    Google ScholarFindings
  • Zhong, Z.; Zheng, L.; Luo, Z.; Li, S.; and Yang, Y. 2019. Invariance matters: Exemplar memory for domain adaptive person re-identification. In CVPR.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments