iTAML: An Incremental Task-Agnostic Meta-learning Approach

CVPR, pp. 13585-13594, 2020.

Cited by: 2|Bibtex|Views40|Links
EI
Keywords:
ms celebRiemannian Walkcatastrophic forgettingsingle modeltask agnostic metaMore(16+)
Weibo:
We introduce a novel meta-learning approach that seeks to maintain an equilibrium between all the encountered tasks

Abstract:

Humans can continuously learn new knowledge as their experience grows. In contrast, previous learning in deep neural networks can quickly fade out when they are trained on a new task. In this paper, we hypothesize this problem can be avoided by learning a set of generalized parameters, that are neither specific to old nor new tasks. In ...More

Code:

Data:

0
Introduction
  • Visual content is ever-evolving and its volume is rapidly increasing each day. High-dimensionality and mass volume of visual media makes it impractical to store ephemeral or streaming data and process it all at once.
  • This paper studies class-incremental learning where groups of classes are sequentially observed.
  • This case is fundamentally different from conventional classification task due to two key factors, plasticity and stability.
  • This paper takes a different approach to IL based on the following principle: instead of a ‘one size fits all’ approach that learns a single model well-suited for all tasks, the authors propose to learn a generic meta-model which can be quickly adapted to the desired task.
  • The meta-learning framework focuses on learning ‘how to rapidly learn?’ The generality of the model arises from the ‘learning to learn’ training strategy, that focuses on figuring out the shared parametric space common to all hitherto observed tasks
Highlights
  • Visual content is ever-evolving and its volume is rapidly increasing each day
  • This paper takes a different approach to Incremental Learning based on the following principle: instead of a ‘one size fits all’ approach that learns a single model well-suited for all tasks, we propose to learn a generic meta-model which can be quickly adapted to the desired task
  • We empirically found that a Reptile style meta-update is not useful for Incremental Learning while our proposed update rule helps in finding task-specific weights useful for an optimal outer-loop update
  • Datasets: We evaluate our method on a wide spectrum of incremental learning benchmarks
  • We develop a meta-learning approach to train a generic model that can be fast updated for a specific task
  • Our experiments demonstrate consistent improvements across a range of classification datasets including ImageNet, CIFAR100, MNIST, SVHN and MS-Celeb
Methods
  • For incrementally learning 10 tasks, iTAML surpasses the current stateof-the-art RPS-net [21] by a margin of 21.3%.
  • On MS-Celeb-10K dataset with 10, 000 classes, the proposed iTAML achieves 95.02% accuracy and retains its performance with addition of new classes.
  • These experiments strongly demonstrate the suitability and effectiveness of iTAML for large scale incremental learning tasks
Results
  • Datasets: The authors evaluate the method on a wide spectrum of incremental learning benchmarks
  • These include small scale datasets i.e., split MNIST [31] and split SVHN, where each task is assigned with two classes.
  • For. Comparison with meta-learning Algorithms: Fig. 5 compares different first-order meta-learning algorithms i.e., FOMAML and Reptile with the iTAML and joint training on task-agnostic, task-aware, and no inference-update settings on CIFAR100 for 10 tasks.
  • Compared with task-agnostic settings, the performance of FOMAML and Reptile improves when the task label is known.
Conclusion
  • Incremental learning aims to learn a single model that can continuously adapt itself as the new information becomes available, without overriding existing knowledge.
  • To this end, this work proposes to update the model such that a common set of parameters is optimized on all so-far-seen tasks, without being specific to a single learning task.
  • The authors' experiments demonstrate consistent improvements across a range of classification datasets including ImageNet, CIFAR100, MNIST, SVHN and MS-Celeb
Summary
  • Introduction:

    Visual content is ever-evolving and its volume is rapidly increasing each day. High-dimensionality and mass volume of visual media makes it impractical to store ephemeral or streaming data and process it all at once.
  • This paper studies class-incremental learning where groups of classes are sequentially observed.
  • This case is fundamentally different from conventional classification task due to two key factors, plasticity and stability.
  • This paper takes a different approach to IL based on the following principle: instead of a ‘one size fits all’ approach that learns a single model well-suited for all tasks, the authors propose to learn a generic meta-model which can be quickly adapted to the desired task.
  • The meta-learning framework focuses on learning ‘how to rapidly learn?’ The generality of the model arises from the ‘learning to learn’ training strategy, that focuses on figuring out the shared parametric space common to all hitherto observed tasks
  • Methods:

    For incrementally learning 10 tasks, iTAML surpasses the current stateof-the-art RPS-net [21] by a margin of 21.3%.
  • On MS-Celeb-10K dataset with 10, 000 classes, the proposed iTAML achieves 95.02% accuracy and retains its performance with addition of new classes.
  • These experiments strongly demonstrate the suitability and effectiveness of iTAML for large scale incremental learning tasks
  • Results:

    Datasets: The authors evaluate the method on a wide spectrum of incremental learning benchmarks
  • These include small scale datasets i.e., split MNIST [31] and split SVHN, where each task is assigned with two classes.
  • For. Comparison with meta-learning Algorithms: Fig. 5 compares different first-order meta-learning algorithms i.e., FOMAML and Reptile with the iTAML and joint training on task-agnostic, task-aware, and no inference-update settings on CIFAR100 for 10 tasks.
  • Compared with task-agnostic settings, the performance of FOMAML and Reptile improves when the task label is known.
  • Conclusion:

    Incremental learning aims to learn a single model that can continuously adapt itself as the new information becomes available, without overriding existing knowledge.
  • To this end, this work proposes to update the model such that a common set of parameters is optimized on all so-far-seen tasks, without being specific to a single learning task.
  • The authors' experiments demonstrate consistent improvements across a range of classification datasets including ImageNet, CIFAR100, MNIST, SVHN and MS-Celeb
Tables
  • Table1: Comparison on MNIST and SVHN datasets. ‘∗’ denotes memory based methods. iTAML outperforms stateof-the-art and performs quite close to oracle case
  • Table2: Large-scale experiments on ImageNet-1K and and MS-Celeb-10K show that iTAML outperforms all the state-ofthe-art methods by a significant margin. Note that reported task t accuracy is an average of all 1, 2, .., t tasks
Download tables as Excel
Related work
  • Existing IL methods propose architectural modifications for deep CNNs e.g., dynamic networks [21, 22], dualmemory modules [7], and network expansion [25]. Rehearsal based methods have also been proposed that replay the old task by using an exemplar set [23, 4] or synthesize samples using generative models [27, 28]. IL approaches that work fundamentally on algorithmic level can be grouped into regularization and meta-learning based methods. We discuss these two sets of approaches next. Regularization Strategies for Incremental Learning: The regularization based methods impose constraints during learning that seek to retain past knowledge. For example, learning without forgetting [15] adds a distillation loss to preserve the old knowledge while sequentially learning new tasks. Different from the ‘task-incremental’ setting explored in [15], [23, 2] apply distillation loss in ‘classincremental’ setting to reduce forgetting. A distillation loss on the attention maps of the deep network is proposed in [5] that minimizes overriding the old-task information. Recently, [30] advocates for a simple bias correction strategy that promotes re-balancing the final classifier layer to give equal importance to current and older classes.
Funding
  • Introduces a novel meta-learning approach that seeks to maintain an equilibrium between all the encountered tasks
  • When presented with a continuum of data, our model automatically identifies the task and quickly adapts to it with just a single update
  • Studies class-incremental learning where groups of classes are sequentially observed
  • Proposes to learn a generic meta-model which can be quickly adapted to the desired task
  • Proposes iTAML, a task-agnostic meta-learning algorithm designed for IL settings
Reference
  • Rahaf Aljundi, Francesca Babiloni, Mohamed Elhoseiny, Marcus Rohrbach, and Tinne Tuytelaars. Memory aware synapses: Learning what (not) to forget. In Proceedings of the European Conference on Computer Vision (ECCV), pages 139–154, 2018. 2, 6, 7
    Google ScholarLocate open access versionFindings
  • Francisco M. Castro, Manuel J. Marin-Jimenez, Nicolas Guil, Cordelia Schmid, and Karteek Alahari. End-to-end incremental learning. In The European Conference on Computer Vision (ECCV), September 2018. 2
    Google ScholarLocate open access versionFindings
  • Arslan Chaudhry, Puneet K Dokania, Thalaiyasingam Ajanthan, and Philip HS Torr. Riemannian walk for incremental learning: Understanding forgetting and intransigence. In Proceedings of the European Conference on Computer Vision (ECCV), pages 532–547, 2018. 2, 6
    Google ScholarLocate open access versionFindings
  • Arslan Chaudhry, Marc’Aurelio Ranzato, Marcus Rohrbach, and Mohamed Elhoseiny. Efficient lifelong learning with agem. In ICLR, 2019. 2
    Google ScholarLocate open access versionFindings
  • Prithviraj Dhar, Rajat Vikram Singh, Kuan-Chuan Peng, Ziyan Wu, and Rama Chellappa. Learning without memorizing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5138–5146, 2019. 2
    Google ScholarLocate open access versionFindings
  • Chelsea Finn, Pieter Abbeel, and Sergey Levine. Modelagnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1126–1135. JMLR. org, 2017. 1
    Google ScholarLocate open access versionFindings
  • Alexander Gepperth and Cem Karaoguz. A bio-inspired incremental learning architecture for applied perceptual problems. Cognitive Computation, 8(5):924–934, 2016. 2
    Google ScholarLocate open access versionFindings
  • Yandong Guo and Lei Zhang. One-shot face recognition by promoting underrepresented classes. arXiv preprint arXiv:1707.05574, 2017. 6
    Findings
  • Yen-Chang Hsu, Yen-Cheng Liu, Anita Ramasamy, and Zsolt Kira. Re-evaluating continual learning scenarios: A categorization and case for strong baselines. NeurIPS Continual learning Workshop, 2018. 6
    Google ScholarLocate open access versionFindings
  • Muhammad Abdullah Jamal and Guo-Jun Qi. Task agnostic meta-learning for few-shot learning. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019. 2
    Google ScholarLocate open access versionFindings
  • Khurram Javed and Martha White. Meta-learning representations for continual learning. Advances in Neural Information Processing Systems, 2019. 2
    Google ScholarLocate open access versionFindings
  • Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. 12
    Findings
  • James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka GrabskaBarwinska, et al. Overcoming catastrophic forgetting in neural networks. In Proceedings of the national academy of sciences, volume 114, pages 3521–3526. National Acad Sciences, 2017. 2, 6, 7
    Google ScholarLocate open access versionFindings
  • Sang-Woo Lee, Jin-Hwa Kim, Jaehyun Jun, Jung-Woo Ha, and Byoung-Tak Zhang. Overcoming catastrophic forgetting by incremental moment matching. In Advances in neural information processing systems, pages 4652–4662, 2017. 2
    Google ScholarLocate open access versionFindings
  • Zhizhong Li and Derek Hoiem. Learning without forgettin. IEEE transactions on pattern analysis and machine intelligence, 40(12):2935–2947, 2018. 2, 6, 7, 8
    Google ScholarFindings
  • Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, and Jiawei Han. On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265, 2019. 6, 12
    Findings
  • David Lopez-Paz et al. Gradient episodic memory for continual learning. In Advances in Neural Information Processing Systems, pages 6467–6476, 202, 6, 7
    Google ScholarLocate open access versionFindings
  • Alex Nichol, Joshua Achiam, and John Schulman. On first-order meta-learning algorithms. arXiv preprint arXiv:1803.02999, 202, 4
    Findings
  • Alex Nichol and John Schulman. Reptile: a scalable metalearning algorithm. arXiv preprint arXiv:1803.02999, 2, 2018. 2, 3, 4, 11
    Findings
  • B Pfulb and A Gepperth. A comprehensive, applicationoriented study of catastrophic forgetting in dnns. ICLR, 2019. 1
    Google ScholarLocate open access versionFindings
  • Jathushan Rajasegaran, Munawar Hayat, Salman Khan, Fahad Shahbaz Khan, and Ling Shao. Random path selection for incremental learning. Advances in Neural Information Processing Systems, 2019. 2, 6, 7, 8
    Google ScholarLocate open access versionFindings
  • Jathushan Rajasegaran, Munawar Hayat, Salman Khan, Fahad Shahbaz Khan, Ling Shao, and Ming-Hsuan Yang. An adaptive random path selection approach for incremental learning, 2019. 2
    Google ScholarFindings
  • Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H Lampert. icarl: Incremental classifier and representation learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2001–2010, 2017. 2, 6, 8
    Google ScholarLocate open access versionFindings
  • Matthew Riemer, Ignacio Cases, Robert Ajemian, Miao Liu, Irina Rish, Yuhai Tu,, and Gerald Tesauro. Learning to learn without forgetting by maximizing transfer and minimizing interference. In International Conference on Learning Representations, 2019. 2
    Google ScholarLocate open access versionFindings
  • Andrei A Rusu, Neil C Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, and Raia Hadsell. Progressive neural networks. arXiv preprint arXiv:1606.04671, 2016. 2
    Findings
  • Jonathan Schwarz, Jelena Luketina, Wojciech M Czarnecki, Agnieszka Grabska-Barwinska, Yee Whye Teh, Razvan Pascanu, and Raia Hadsell. Progress & compress: A scalable framework for continual learning. arXiv preprint arXiv:1805.06370, 2018. 7
    Findings
  • Hanul Shin, Jung Kwon Lee, Jaehong Kim, and Jiwon Kim. Continual learning with deep generative replay. In Advances in Neural Information Processing Systems, pages 2990–2999, 2017. 2, 7
    Google ScholarLocate open access versionFindings
  • Richard S Sutton. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Machine Learning Proceedings 1990, pages 216–224. Elsevier, 1990. 2
    Google ScholarLocate open access versionFindings
  • Gido M van de Ven and Andreas S Tolias. Generative replay with feedback connections as a general strategy for continual learning. arXiv preprint arXiv:1809.10635, 2018. 7
    Findings
  • Yue Wu, Yinpeng Chen, Lijuan Wang, Yuancheng Ye, Zicheng Liu, Yandong Guo, and Yun Fu. Large scale incremental learning. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019. 2, 6, 8
    Google ScholarLocate open access versionFindings
  • Friedemann Zenke, Ben Poole, and Surya Ganguli. Continual learning through synaptic intelligence. In Proceedings of the 34th International Conference on Machine LearningVolume 70, pages 3987–3995. JMLR. org, 2017. 2, 6, 7
    Google ScholarLocate open access versionFindings
  • Junting Zhang, Jie Zhang, Shalini Ghosh, Dawei Li, Serafettin Tasci, Larry Heck, Heming Zhang, and C-C Jay Kuo. Class-incremental learning via deep model consolidation. arXiv preprint arXiv:1903.07864, 2019. 6
    Findings
Your rating :
0

 

Tags
Comments