Multi-Mutual Consistency Induced Transfer Subspace Learning for Human Motion Segmentation

CVPR, pp. 10274-10283, 2020.

Cited by: 4|Bibtex|Views64|Links
EI
Keywords:
Augmented Lagrange MultiplierTemporal Subspace ClusteringLow-rank Transfer Subspacedomain invariantUT-Interaction DatasetMore(22+)
Weibo:
We have proposed a multi-mutual consistency induced transfer subspace learning framework for human motion segmentation

Abstract:

Human motion segmentation based on transfer subspace learning is a rising interest in action-related tasks. Although progress has been made, there are still several issues within the existing methods. First, existing methods transfer knowledge from source data to target tasks by learning domain-invariant features, but they ignore to prese...More

Code:

Data:

0
Introduction
  • Human motion segmentation aims to partition visual data sequences that depict human actions and activities into a set of preferably non-overlapping and internally coherent temporal segments.
  • It is an important preprocessing step before further motion and action related analytical tasks [26, 38, 48, 59].
  • The subspace clustering-based methods have attracted notable attention and obtained promising results
Highlights
  • Human motion segmentation aims to partition visual data sequences that depict human actions and activities into a set of preferably non-overlapping and internally coherent temporal segments
  • The videos were obtained by using a fixed camera with the subjects standing in front of a static and simple background. Multi-Modal Action Detection Dataset (MAD) [17] consists of actions captured in multiple modalities by using a Microsoft Kinect V2 system in RGB, depth and skeleton formats
  • We have proposed a multi-mutual consistency induced transfer subspace learning framework for human motion segmentation
  • Our model first factorizes the original features of the source and target data into implicit multi-layer feature spaces, in which we use a mutual consistency learning strategy to reduce the distribution difference between the two domains
  • We present a temporal correlation preservation term to improve the effectiveness of learned representations
  • Experimental results on benchmark datasets show that our method can significantly outperform the state-of-the-art methods
Methods
  • Method SC

    KMD LRR OSC SSC LSR TSC(M) TSS(M) LTS(M) Ours(M) TSC(W) TSS(W) LTS(W) Ours(W) TSC (U) TSS(U) LTS(U) Ours(U)

    (b) Results on MAD dataset

    KMD LRR OSC SSC LSR TSC(K) TSS(K) LTS(K) Ours(K) TSC(W) TSS(W) LTS(W) Ours(W) TSC (U) TSS(U) LTS(U) Ours(U)

    (c) Results on Weizman dataset

    KMD LRR OSC SSC LSR TSC(K) TSS(K) LTS(K) Ours(K) TSC(M) TSS(M) LTS(M) Ours(M) TSC (U) TSS(U) LTS(U) Ours(U)

    (d) Results on UT dataset

    KMD LRR OSC SSC LSR TSC(K) TSS(K) LTS(K) Ours(K) TSC(M) TSS(M) LTS(M) Ours(M) TSC (W) TSS(W) LTS(W) Ours(W)

    Ours LTS TSS TSC SSC LRR GT KEck

    Figure 3: Visualization of clustering results on a sample video of the

    Keck dataset.
  • (b) Results on MAD dataset.
  • (c) Results on Weizman dataset.
  • (d) Results on UT dataset.
  • KMD LRR OSC SSC LSR TSC(K) TSS(K) LTS(K) Ours(K) TSC(M) TSS(M) LTS(M) Ours(M) TSC (W) TSS(W) LTS(W) Ours(W).
  • Ours LTS TSS TSC SSC LRR GT KEck. Figure 3: Visualization of clustering results on a sample video of the.
  • Keck dataset.
Conclusion
  • The authors have proposed a multi-mutual consistency induced transfer subspace learning framework for human motion segmentation.
  • The authors' model first factorizes the original features of the source and target data into implicit multi-layer feature spaces, in which the authors use a mutual consistency learning strategy to reduce the distribution difference between the two domains.
  • The authors carry out the transfer subspace learning in multi-level feature spaces to effectively exploit different-level structural information.
Summary
  • Introduction:

    Human motion segmentation aims to partition visual data sequences that depict human actions and activities into a set of preferably non-overlapping and internally coherent temporal segments.
  • It is an important preprocessing step before further motion and action related analytical tasks [26, 38, 48, 59].
  • The subspace clustering-based methods have attracted notable attention and obtained promising results
  • Methods:

    Method SC

    KMD LRR OSC SSC LSR TSC(M) TSS(M) LTS(M) Ours(M) TSC(W) TSS(W) LTS(W) Ours(W) TSC (U) TSS(U) LTS(U) Ours(U)

    (b) Results on MAD dataset

    KMD LRR OSC SSC LSR TSC(K) TSS(K) LTS(K) Ours(K) TSC(W) TSS(W) LTS(W) Ours(W) TSC (U) TSS(U) LTS(U) Ours(U)

    (c) Results on Weizman dataset

    KMD LRR OSC SSC LSR TSC(K) TSS(K) LTS(K) Ours(K) TSC(M) TSS(M) LTS(M) Ours(M) TSC (U) TSS(U) LTS(U) Ours(U)

    (d) Results on UT dataset

    KMD LRR OSC SSC LSR TSC(K) TSS(K) LTS(K) Ours(K) TSC(M) TSS(M) LTS(M) Ours(M) TSC (W) TSS(W) LTS(W) Ours(W)

    Ours LTS TSS TSC SSC LRR GT KEck

    Figure 3: Visualization of clustering results on a sample video of the

    Keck dataset.
  • (b) Results on MAD dataset.
  • (c) Results on Weizman dataset.
  • (d) Results on UT dataset.
  • KMD LRR OSC SSC LSR TSC(K) TSS(K) LTS(K) Ours(K) TSC(M) TSS(M) LTS(M) Ours(M) TSC (W) TSS(W) LTS(W) Ours(W).
  • Ours LTS TSS TSC SSC LRR GT KEck. Figure 3: Visualization of clustering results on a sample video of the.
  • Keck dataset.
  • Conclusion:

    The authors have proposed a multi-mutual consistency induced transfer subspace learning framework for human motion segmentation.
  • The authors' model first factorizes the original features of the source and target data into implicit multi-layer feature spaces, in which the authors use a mutual consistency learning strategy to reduce the distribution difference between the two domains.
  • The authors carry out the transfer subspace learning in multi-level feature spaces to effectively exploit different-level structural information.
Tables
  • Table1: Clustering comparison results in terms of NMI and ACC on four human motion datasets. Names in brackets indicate the source datasets. M, K,
Download tables as Excel
Related work
  • Subspace clustering builds on the assumption that data points are drawn from multiple subspaces corresponding to different clusters. Recently, self-representation based subspace clustering, where each data point is expressed with a linear combination of other data points, has captured increasing attention [60, 53, 61]. For example, Sparse Subspace Clustering (SSC) [8] searches the sparsest representation among the infinitely many possible representations based on 1-norm. Low-Rank Representation Clustering (LRR) [29] attempts to reveal cluster structure with a low-rank representation. SMooth Representation clustering (SMR) [16] analyzes the grouping effect of representationbased methods. There are also several deep learning-based subspace clustering approaches [19, 37, 53, 55, 57]. However, these methods cannot be directly applied in human motion segmentation since they ignore the temporal correlations between successive frames.
Funding
  • Acknowledgements: This research was supported in part by NSF of China (No:61973162), NSF of Jiangsu Province (No:BK20171430), the Fundamental Research Funds for the Central Universities (No:30918011319), Zhejiang Lab’s Open Fund (No:2019KD0AB04), CCF-Tencent Open Fund, the “Young Elite Scientists Sponsorship Program” by Jiangsu Province, and the “Young Elite Scientists Sponsorship Program” by CAST (No:2018QNRC001)
Reference
  • S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge university press, 2004. 4, 5
    Google ScholarFindings
  • L. Bruzzone and M. Marconcini. Domain adaptation problems: A dasvm classification technique and a circular validation strategy. IEEE TPAMI, 32(5):770–787, 2009. 2
    Google ScholarLocate open access versionFindings
  • J.-F. Cai, Emmanuel J. Candes, and Z. Shen. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20(4):1956–1982, 2010. 5
    Google ScholarLocate open access versionFindings
  • X. Cao, C. Zhang, C. Zhou, H. Fu, and H. Foroosh. Constrained multi-view video face clustering. IEEE TIP, 24(11):4381–4393, 2015. 1
    Google ScholarLocate open access versionFindings
  • Y. Cui, Y. Song, C. Sun, A. Howard, and S. Belongie. Large scale fine-grained categorization and domain-specific transfer learning. In CVPR, pages 4109–4118, 2018. 1
    Google ScholarLocate open access versionFindings
  • N. Danal and B. Triggs. Histgram of oriented gradients for human detection. In CVPR, pages 886–893, 2005. 6
    Google ScholarLocate open access versionFindings
  • Z. Ding and Y. Fu. Deep transfer low-rank coding for crossdomain learning. IEEE TNNLS, 30(6):1768–1779, 2018. 3
    Google ScholarLocate open access versionFindings
  • E. Elhamifar and R. Vidal. Sparse subspace clustering: Algorithm, theory, and applications. IEEE TPAMI, 35(11):2765– 2781, 2013. 1, 2, 6
    Google ScholarLocate open access versionFindings
  • K. Fu, D.-P. Fan, G.-P. Ji, and Q. Zhao. Jl-dcf: Joint learning and densely-cooperative fusion framework for rgb-d salient object detection. In CVPR, 2020. 8
    Google ScholarLocate open access versionFindings
  • Y. Ganin and V. Lempitsky. Unsupervised domain adaptation by backpropagation. ICML, pages 1180–1189, 2015. 3
    Google ScholarLocate open access versionFindings
  • M. Geng, Y. Wang, T. Xiang, and Y. Tian. Deep transfer learning for person re-identification. arXiv preprint arXiv:1611.05244, 2016. 3
    Findings
  • B. Gholami and V. Pavlovic. Probabilistic temporal subspace clustering. In CVPR, pages 3066–3075, 2017. 1
    Google ScholarLocate open access versionFindings
  • B. Gong, K. Grauman, and F. Sha. Connecting the dots with landmarks: Discriminatively learning domain-invariant features for unsupervised domain adaptation. In ICML, pages 222–230, 202
    Google ScholarLocate open access versionFindings
  • L. Gorelick, M. Blank, E. Shechtman, M. Irani, and R. Basri. Actions as space-time shapes. IEEE TPAMI, 29(12):2247– 2253, 2007. 6
    Google ScholarLocate open access versionFindings
  • G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507, 2006. 5
    Google ScholarLocate open access versionFindings
  • H. Hu, Z. Lin, J. Feng, and J. Zhou. Smooth representation clustering. In CVPR, pages 3834–3841, 2014. 1, 2
    Google ScholarLocate open access versionFindings
  • D. Huang, S. Yao, Y. Wang, and Fe. De La Torre. Sequential max-margin event detectors. In ECCV, pages 410–424. Springer, 2014. 6
    Google ScholarLocate open access versionFindings
  • J. Huang, F. Nie, and H. Huang. A new simplex sparse learning model to measure data similarity for clustering. In IJCAI, 2015. 5
    Google ScholarLocate open access versionFindings
  • P. Ji, T. Zhang, H. Li, M. Salzmann, and I. Reid. Deep subspace clustering networks. In NIPS, pages 24–33, 2017. 2
    Google ScholarLocate open access versionFindings
  • S. Jiang, Z. Ding, and Y. Fu. Heterogeneous recommendation via deep low-rank sparse collective factorization. IEEE TPAMI, 2019. 3
    Google ScholarLocate open access versionFindings
  • Z. Jiang, Z. Lin, and L. Davis. Recognizing human actions by learning and matching shape-motion prototype trees. IEEE TPAMI, 34(3):533–547, 2012. 6
    Google ScholarLocate open access versionFindings
  • E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra. Locally adaptive dimensionality reduction for indexing large time series databases. In ACM Sigmod Record, volume 30, pages 151–162. ACM, 2001. 1
    Google ScholarLocate open access versionFindings
  • E. Keogh and S. Kasetty. On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Mining and Knowledge Discovery, 7(4):349–371, 2003. 1
    Google ScholarLocate open access versionFindings
  • H. Kriegel, P. Kroger, and A. Zimek. Clustering highdimensional data: A survey on subspace clustering, patternbased clustering, and correlation clustering. ACM TKDD, 3(1):1, 2009. 1
    Google ScholarLocate open access versionFindings
  • S. Li, K. Li, and Y. Fu. Temporal subspace clustering for human motion segmentation. In ICCV, pages 4453–4461, 2015. 1, 2, 4, 6
    Google ScholarLocate open access versionFindings
  • T. Li, Z. Liang, S. Zhao, J. Gong, and J. Shen. Self-learning with rectification strategy for human parsing. In CVPR, 2020. 1
    Google ScholarFindings
  • W. Li, Z. Xu, D. Xu, D. Dai, and L. Van Gool. Domain generalization and adaptation using low rank exemplar svms. IEEE TPAMI, 40(5):1114–1127, 2017. 2
    Google ScholarLocate open access versionFindings
  • Z. Lin, R. Liu, and Z. Su. Linearized alternating direction method with adaptive penalty for low-rank representation. In NIPS, pages 612–620, 2011. 4
    Google ScholarLocate open access versionFindings
  • G. Liu, Z. Lin, and et al. Robust recovery of subspace structures by low-rank representation. IEEE TPAMI, 35(1):171– 184, 2013. 1, 2, 3, 4, 6
    Google ScholarLocate open access versionFindings
  • G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma. Robust recovery of subspace structures by low-rank representation. IEEE TPAMI, 35(1):171–184, 2012. 5
    Google ScholarLocate open access versionFindings
  • M. Long, Y. Cao, J. Wang, and M. I. Jordan. Learning transferable features with deep adaptation networks. In ICML, pages 97–105, 2015. 3
    Google ScholarLocate open access versionFindings
  • C.-Y. Lu, H. Min, Z.-Q. Zhao, L. Zhu, D.-S. Huang, and S. Yan. Robust and efficient subspace segmentation via least squares regression. In ECCV, pages 347–360. Springer, 2012. 1, 6
    Google ScholarLocate open access versionFindings
  • A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In NIPS, pages 849–856, 2002. 1, 6
    Google ScholarLocate open access versionFindings
  • J. Ni, Q. Qiu, and R. Chellappa. Subspace interpolation via dictionary learning for unsupervised domain adaptation. In CVPR, pages 692–699, 2013. 2
    Google ScholarLocate open access versionFindings
  • Z. Niu, M. Zhou, L. Wang, X. Gao, and G. Hua. Hierarchical multimodal lstm for dense visual-semantic embedding. In ICCV, pages 1881–1889, 2017. 3
    Google ScholarLocate open access versionFindings
  • P. Pan, Z. Xu, Y. Yang, F. Wu, and Y. Zhuang. Hierarchical recurrent neural encoder for video representation with application to captioning. In CVPR, pages 1029–1038, 2016. 3
    Google ScholarLocate open access versionFindings
  • X. Peng, J. Feng, S. Xiao, W. Y. Yau, J. T. Zhou, and S. Yang. Structured autoencoders for subspace clustering. IEEE TIP, 27(10):5076–5086, 2018. 2
    Google ScholarLocate open access versionFindings
  • S. Qi, W. Wang, B. Jia, J. Shen, and S.-C. Zhu. Learning human-object interactions by graph parsing neural networks. In ECCV, pages 401–417, 2018. 1
    Google ScholarLocate open access versionFindings
  • M. W. Robards and P. Sunehag. Semi-markov kmeans clustering and activity recognition from body-worn sensors. In ICDM, pages 438–446. IEEE, 2009. 2
    Google ScholarLocate open access versionFindings
  • M. S. Ryoo and J. K. Aggarwal. Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. In ICCV, volume 1, page 2. Citeseer, 2009. 6
    Google ScholarLocate open access versionFindings
  • M. Shao, D. Kit, and Y. Fu. Generalized transfer subspace learning through low-rank constraint. IJCV, 109(1-2):74–93, 2014. 3
    Google ScholarLocate open access versionFindings
  • S. Shekhar, V. M. Patel, H. V. Nguyen, and R. Chellappa. Generalized domain-adaptive dictionaries. In CVPR, pages 361–368, 2013. 2
    Google ScholarLocate open access versionFindings
  • J. Shi and J. Malik. Motion segmentation and tracking using normalized cuts. In ICCV, pages 1154–1160. IEEE, 1998. 4
    Google ScholarLocate open access versionFindings
  • S. Tierney, J. Gao, and Y. Guo. Subspace clustering for sequential data. In CVPR, pages 1019–1026, 2014. 6
    Google ScholarLocate open access versionFindings
  • E. Tzeng, J. Hoffman, T. Darrell, and K. Saenko. Simultaneous deep transfer across domains and tasks. In ICCV, pages 4068–4076, 2015. 3
    Google ScholarLocate open access versionFindings
  • L. Wang, Z. Ding, and Y. Fu. Learning transferable subspace for human motion segmentation. In AAAI, 2018. 1, 2, 4, 6
    Google ScholarLocate open access versionFindings
  • L. Wang, Z. Ding, and Y. Fu. Low-rank transfer human motion segmentation. IEEE TIP, 28(2):1023–1034, 2018. 1, 2, 6
    Google ScholarLocate open access versionFindings
  • W. Wang, Z. Zhang, S. Qi, J. Shen, Y. Pang, and L. Shao. Learning compositional neural information fusion for human parsing. In ICCV, pages 5703–5713, 2019. 1
    Google ScholarLocate open access versionFindings
  • Y. Xiong and D.-Y. Yeung. Mixtures of arma models for model-based time series clustering. In ICDM, pages 717– 720. IEEE, 2002. 1
    Google ScholarLocate open access versionFindings
  • Y. Xu, X. Fang, Ji. Wu, X. Li, and D. Zhang. Discriminative transfer subspace learning via low-rank and sparse representation. IEEE TIP, 25(2):850–863, 2015. 3
    Google ScholarLocate open access versionFindings
  • M. Ye and J. Shen. Probabilistic structural latent representation for unsupervised embedding. In CVPR, 2020. 3
    Google ScholarLocate open access versionFindings
  • A. R. Zamir, A. Sax, W. Shen, L. J. Guibas, J. Malik, and S. Savarese. Taskonomy: Disentangling task transfer learning. In CVPR, pages 3712–3722, 2018. 1
    Google ScholarLocate open access versionFindings
  • C. Zhang, H. Fu, Q. Hu, X. Cao, Y. Xie, D. Tao, and D. Xu. Generalized latent multi-view subspace clustering. IEEE TPAMI, 42(1):86–99, 2018. 2
    Google ScholarLocate open access versionFindings
  • J. Zhang, W. Li, and P. Ogunbona. Joint geometrical and statistical alignment for visual domain adaptation. In CVPR, pages 1859–1867, 2017. 2
    Google ScholarLocate open access versionFindings
  • H. Zhao, Z. Ding, and Y. Fu. Multi-view clustering via deep matrix factorization. In AAAI, pages 2921–2927, 2017. 2
    Google ScholarLocate open access versionFindings
  • F. Zhou, F. De la Torre, and J. K. Hodgins. Hierarchical aligned cluster analysis for temporal clustering of human motion. IEEE TPAMI, 35(3):582–596, 2012. 2
    Google ScholarLocate open access versionFindings
  • P. Zhou, Y. Hou, and J. Feng. Deep adversarial subspace clustering. In CVPR, pages 1596–1604, 2018. 2
    Google ScholarLocate open access versionFindings
  • T. Zhou, H. Fu, G. Chen, J. Shen, and L. Shao. Hi-Net: Hybrid-fusion network for multi-modal MR image synthesis. IEEE TMI, 2020. 8
    Google ScholarLocate open access versionFindings
  • T. Zhou, W. Wang, S. Qi, H. Ling, and J. Shen. Cascaded human-object interaction recognition. In CVPR, 2020. 1
    Google ScholarFindings
  • T. Zhou, C. Zhang, C. Gong, H. Bhaskar, and J. Yang. Multiview latent space learning with feature redundancy minimization. IEEE TCYB, 2018. 2
    Google ScholarLocate open access versionFindings
  • T. Zhou, C. Zhang, X. Peng, H. Bhaskar, and J. Yang. Dual shared-specific multiview subspace clustering. IEEE TCYB, 2019. 2
    Google ScholarLocate open access versionFindings
  • F. Zhu and L. Shao. Weakly-supervised cross-domain dictionary learning for visual recognition. IJCV, 109(1-2):42–59, 2014. 3
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments