Train, Diagnose and Fix: Interpretable Approach for Fine-grained Action Recognition.

arXiv: Computer Vision and Pattern Recognition(2017)

引用 23|浏览22
暂无评分
摘要
Despite the growing discriminative capabilities of modern deep learning methods for recognition tasks, the inner workings of the state-of-art models still remain mostly black-boxes. In this paper, we propose a systematic interpretation of model parameters and hidden representations of Residual Temporal Convolutional Networks (Res-TCN) for action recognition in time-series data. We also propose a Feature Map Decoder as part of the interpretation analysis, which outputs a representation of modelu0027s hidden variables in the same domain as the input. Such analysis empowers us to expose modelu0027s characteristic learning patterns in an interpretable way. For example, through the diagnosis analysis, we discovered that our model has learned to achieve view-point invariance by implicitly learning to perform rotational normalization of the input to a more discriminative view. Based on the findings from the model interpretation analysis, we propose a targeted refinement technique, which can generalize to various other recognition models. The proposed work introduces a three-stage paradigm for model learning: training, interpretable diagnosis and targeted refinement. We validate our approach on skeleton based 3D human action recognition benchmark of NTU RGB+D. We show that the proposed workflow is an effective model learning strategy and the resulting Multi-stream Residual Temporal Convolutional Network (MS-Res-TCN) achieves the state-of-the-art performance on NTU RGB+D.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要