Cross-View Action Recognition Using View-Invariant Pose Feature Learned From Synthetic Data With Domain Adaptation

COMPUTER VISION - ACCV 2018, PT II(2018)

引用 1|浏览2
暂无评分
摘要
Recognizing human activities from unknown views is a challenging problem since human shapes appear quite differently from different viewpoints. In this paper, we learn a View-Invariant Pose (VIP) feature for depth-based cross-view action recognition. The proposed VIP feature encoder is a deep convolutional neural network that transfers human poses from multiple viewpoints to a shared high-level feature space. Learning such a deep model requires a large corpus of multi-view paired data which is very expensive to collect. Therefore, we generate a synthetic dataset by fitting human physical models with real motion capture data in the simulators and rendering depth images from various viewpoints. The VIP feature is learned from the synthetic data in an unsupervised way. To ensure the transferability from synthetic data to real data, domain adaptation is employed to minimize the domain difference. Moreover, an action can be considered as a sequence of poses and the temporal progress is modeled by recurrent neural network. In the experiments, our method is applied on two benchmark datasets of multi-view 3D human action and has been shown to achieve promising results when compared with the state-of-the-arts.
更多
查看译文
关键词
Action recognition, Domain adaptation, Cross-view, Deep learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要