Consistent MetaReg: Alleviating Intra-task Discrepancy for Better Meta-knowledge

Pinzhuo Tian
Pinzhuo Tian
Lei Qi
Lei Qi
Shaokang Dong
Shaokang Dong

IJCAI, pp. 2718-2724, 2020.

Cited by: 0|Bibtex|Views54|Links
EI
Keywords:
meta learning modelgradient-based meta-learning approachesmeta regularization methodmeta knowledgeextensive experimentMore(11+)
Weibo:
By mitigating the intra-task discrepancy, our proposed Consistent Metaregularization can improve the average performance of model-agnostic meta learning, OptNet and R2D2, 0.30%, 1.10% and 2.66% on miniImageNet, respectively

Abstract:

In the few-shot learning scenario, the data-distribution discrepancy between training data and test data in a task usually exists due to the limited data. However, most existing meta-learning approaches seldom consider this intra-task discrepancy in the meta-training phase which might deteriorate the performance. To overcome this limitati...More

Code:

Data:

0
Introduction
  • Learning quickly is a kind of ability of human intelligence, e.g., children can recognize objects only from a few examples
  • This poses a great challenge to the existing deep learning models, which require large-scale training data to achieve promising performance.
  • To tackle this problem, in recent years, meta-learning has drawn increasing interest in the machine learning community [Finn et al, 2017; Rajeswaran et al, 2019].
  • Most existing metalearning approaches usually ignore this discrepancy during
Highlights
  • Learning quickly is a kind of ability of human intelligence, e.g., children can recognize objects only from a few examples
  • We demonstrate the meta-learning methods with the proposed Consistent Metaregularization can have lower regret bound in theory, and extensive experiments highlight that adding Consistent Metaregularization into the conventional meta-learning frameworks can achieve better performance
  • Model-agnostic meta learning-Consistent Metaregularization, etc. represent the methods integrated with our Consistent Meta-regularization
  • By mitigating the intra-task discrepancy, our proposed Consistent Metaregularization can improve the average performance of model-agnostic meta learning, OptNet and R2D2, 0.30% (0.33%), 1.10% (1.90%) and 2.66% (2.59%) on miniImageNet, respectively
  • Our Consistent Metaregularization method can boost the average performance of model-agnostic meta learning, OptNet, and R2D2 by 3.14%, 3.37% and 2.72% on office31
  • We introduce Consistent Meta-regularization to alleviate the discrepancy for gradient-based meta-learning approaches
Methods
  • The bilevel gradient-based method has attracted many attentions, which achieves the state-of-the-art performance in many computer vision tasks [Lee et al, 2019; Tian et al, 2019].
  • MAML or bilevel based methods can learn meta-knowledge by minimizing Eq 4 and Eq 7, respectively, all of them ignore the intra-task discrepancy in the meta-training set, which is not conducive to learn good meta-knowledge.
  • The authors minimize the difference between Mitr and Mits as a regularization which can be inserted into the traditional meta-loss to alleviate the intra-task discrepancy.
  • The test data Sits is just used to minimize the meta-loss in traditional gradient-based meta-learning
Results
  • Tab. 1 reports the results on the 5-way miniImageNet and tieredImageNet. All reported results are average performances with 95% confidence interval over 2000 tasks randomly sampled from meta-testing split.
  • By mitigating the intra-task discrepancy, the proposed CM can improve the average performance of MAML, OptNet and R2D2, 0.30% (0.33%), 1.10% (1.90%) and 2.66% (2.59%) on miniImageNet, respectively.
  • All reported results are average performances with 95% confidence interval over 1000 tasks randomly sampled from meta-testing split.
  • As for optimization-based methods, the data-distribution discrepancy impact greatly on MAML, and the authors can find similar results in [Finn and Levine, 2018].
  • The proposed regularization can better promote optimization-based methods in this case, i.e., there is a large intra-task difference
Conclusion
  • The authors consider the intra-task discrepancy issue in the traditional meta-learning models, which is usually ignored in previous works.
  • To handle this issue, the authors introduce Consistent Meta-regularization to alleviate the discrepancy for gradient-based meta-learning approaches.
  • In this paper, the authors follow recent meta-learning approaches, which use deep models as metalearner in the non-convex setting.
  • The authors will further give the theoretical analysis in the non-convex setting
Summary
  • Introduction:

    Learning quickly is a kind of ability of human intelligence, e.g., children can recognize objects only from a few examples
  • This poses a great challenge to the existing deep learning models, which require large-scale training data to achieve promising performance.
  • To tackle this problem, in recent years, meta-learning has drawn increasing interest in the machine learning community [Finn et al, 2017; Rajeswaran et al, 2019].
  • Most existing metalearning approaches usually ignore this discrepancy during
  • Methods:

    The bilevel gradient-based method has attracted many attentions, which achieves the state-of-the-art performance in many computer vision tasks [Lee et al, 2019; Tian et al, 2019].
  • MAML or bilevel based methods can learn meta-knowledge by minimizing Eq 4 and Eq 7, respectively, all of them ignore the intra-task discrepancy in the meta-training set, which is not conducive to learn good meta-knowledge.
  • The authors minimize the difference between Mitr and Mits as a regularization which can be inserted into the traditional meta-loss to alleviate the intra-task discrepancy.
  • The test data Sits is just used to minimize the meta-loss in traditional gradient-based meta-learning
  • Results:

    Tab. 1 reports the results on the 5-way miniImageNet and tieredImageNet. All reported results are average performances with 95% confidence interval over 2000 tasks randomly sampled from meta-testing split.
  • By mitigating the intra-task discrepancy, the proposed CM can improve the average performance of MAML, OptNet and R2D2, 0.30% (0.33%), 1.10% (1.90%) and 2.66% (2.59%) on miniImageNet, respectively.
  • All reported results are average performances with 95% confidence interval over 1000 tasks randomly sampled from meta-testing split.
  • As for optimization-based methods, the data-distribution discrepancy impact greatly on MAML, and the authors can find similar results in [Finn and Levine, 2018].
  • The proposed regularization can better promote optimization-based methods in this case, i.e., there is a large intra-task difference
  • Conclusion:

    The authors consider the intra-task discrepancy issue in the traditional meta-learning models, which is usually ignored in previous works.
  • To handle this issue, the authors introduce Consistent Meta-regularization to alleviate the discrepancy for gradient-based meta-learning approaches.
  • In this paper, the authors follow recent meta-learning approaches, which use deep models as metalearner in the non-convex setting.
  • The authors will further give the theoretical analysis in the non-convex setting
Tables
  • Table1: Classification results on miniImageNet and tieredImageNet
  • Table2: Classification results on office-31
Download tables as Excel
Related work
  • The topic of meta-learning (or learning to learn) was introduced and studied several decades ago [Schmidhuber, 1987; Bengio et al, 1992; Thrun and Pratt, 1998]. Early work mostly focused on learning how to dynamically adjust the inductive bias [Utgoff, 1986] or hypothesis space for a learning algorithm [Vilalta and Drissi, 2002]. In recent years, twolevel framework is popular in the present meta-learning approaches. The motivation of this framework is that metalevel (meta-learner) is used to learn meta-knowledge, which can help base-level (task-specific model or base-learner) learn fast or adapt quickly in a new task. The two-level framework can be broadly divided into three categories.

    • Metric-based method. In these methods, a nonparametric similarity function is used as base-learner to evaluate the similarity between examples. The metalearner is trained to learn useful meta-knowledge in the predefined metric space, e.g., Euclidean distance based prototypical networks [Snell et al, 2017], cosine similarity based recurrence with attention mechanisms [Vinyals et al, 2016].
Funding
  • The work is supported by Science and Technology Innovation 2030-“New Generation Artificial Intelligence” Major Project (No 2018AAA0100905) and NSFC (No 1673203)
Reference
  • [Al-Shedivat et al., 2018] Maruan Al-Shedivat, Trapit Bansal, Yura Burda, Ilya Sutskever, Igor Mordatch, and Pieter Abbeel. Continuous adaptation via meta-learning in nonstationary and competitive environments. In ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • [Amit and Meir, 2018] Ron Amit and Ron Meir. Metalearning by adjusting priors based on extended pac-bayes theory. In ICML, pages 205–214, 2018.
    Google ScholarLocate open access versionFindings
  • [Bengio et al., 1992] Samy Bengio, Yoshua Bengio, Jocelyn Cloutier, and Jan Gecsei. On the optimization of a synaptic learning rule. In Preprints Conf. Optimality in Artificial and Biological Neural Networks, volume 2, 1992.
    Google ScholarLocate open access versionFindings
  • [Bertinetto et al., 2019] Luca Bertinetto, Joao F. Henriques, Philip H. S. Torr, and Andrea Vedaldi. Meta-learning with differentiable closed-form solvers. In ICLR, 2019.
    Google ScholarLocate open access versionFindings
  • [Chen et al., 2019] Wei-Yu Chen, Yen-Cheng Liu, Zsolt Kira, Yu-Chiang Frank Wang, and Jia-Bin Huang. A closer look at few-shot classification. In ICLR, 2019.
    Google ScholarLocate open access versionFindings
  • [Finn and Levine, 2018] Chelsea Finn and Sergey Levine. Meta-learning and universality: Deep representations and gradient descent can approximate any learning algorithm. In ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • [Finn et al., 2017] Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In ICML, pages 1126–1135, 2017.
    Google ScholarLocate open access versionFindings
  • [Franceschi et al., 2018] Luca Franceschi, Paolo Frasconi, Saverio Salzo, Riccardo Grazzi, and Massimiliano Pontil. Bilevel programming for hyperparameter optimization and meta-learning. In ICML, pages 1563–1572, 2018.
    Google ScholarLocate open access versionFindings
  • [Hazan et al., 2007] Elad Hazan, Amit Agarwal, and Satyen Kale. Logarithmic regret algorithms for online convex optimization. ML, 69(2-3):169–192, 2007.
    Google ScholarLocate open access versionFindings
  • [Javed and White, 2019] Khurram Javed and Martha White. Meta-learning representations for continual learning. In NeurIPS, pages 1818–1828, 2019.
    Google ScholarLocate open access versionFindings
  • [Lee et al., 2019] Kwonjoon Lee, Subhransu Maji, Avinash Ravichandran, and Stefano Soatto. Meta-learning with differentiable convex optimization. In CVPR, pages 10657– 10665, 2019.
    Google ScholarLocate open access versionFindings
  • [Li et al., 2017] Zhenguo Li, Fengwei Zhou, Fei Chen, and Hang Li. Meta-sgd: Learning to learn quickly for few shot learning. arXiv preprint arXiv:1707.09835, 2017.
    Findings
  • [Munkhdalai and Yu, 2017] Tsendsuren Munkhdalai and Hong Yu. Meta networks. In ICML, pages 2554–2563, 2017.
    Google ScholarLocate open access versionFindings
  • [Na et al., 2019] Donghyun Na, Haebeom Lee, Saehoon Kim, Minseop Park, Eunho Yang, and Sung Ju Hwang. Learning to balance: Bayesian meta-learning for imbalanced and out-of-distribution tasks. arXiv preprint arXiv:1905.12917, 2019.
    Findings
  • [Rajeswaran et al., 2019] Aravind Rajeswaran, Chelsea Finn, Sham M. Kakade, and Sergey Levine. Metalearning with implicit gradients. In NeurIPS, pages 113–124, 2019.
    Google ScholarLocate open access versionFindings
  • [Rakelly et al., 2019] Kate Rakelly, Aurick Zhou, Deirdre Quillen, Chelsea Finn, and Sergey Levine. Efficient offpolicy meta-reinforcement learning via probabilistic context variables. In ICML, pages 5331–5340, 2019.
    Google ScholarLocate open access versionFindings
  • [Ravi and Larochelle, 2017] Sachin Ravi and Hugo Larochelle. Optimization as a model for few-shot learning. In ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • [Ren et al., 2018] Mengye Ren, Eleni Triantafillou, Sachin Ravi, Jake Snell, Kevin Swersky, Joshua B. Tenenbaum, Hugo Larochelle, and Richard S. Zemel. Meta-learning for semi-supervised few-shot classification. In ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • [Russakovsky et al., 2015] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael S. Bernstein, Alexander C. Berg, and Fei-Fei Li. Imagenet large scale visual recognition challenge. IJCV, 115(3):211–252, 2015.
    Google ScholarLocate open access versionFindings
  • [Saenko et al., 2010] Kate Saenko, Brian Kulis, Mario Fritz, and Trevor Darrell. Adapting visual category models to new domains. In ECCV, pages 213–226, 2010.
    Google ScholarLocate open access versionFindings
  • [Schmidhuber, 1987] Jurgen Schmidhuber. Evolutionary principles in self-referential learning, or on learning how to learn: the meta-meta-... hook. PhD thesis, Technische Universitat Munchen, 1987.
    Google ScholarFindings
  • [Snell et al., 2017] Jake Snell, Kevin Swersky, and Richard Zemel. Prototypical networks for few-shot learning. In NIPS, pages 4077–4087, 2017.
    Google ScholarLocate open access versionFindings
  • [Thrun and Pratt, 1998] Sebastian Thrun and Lorien Y. Pratt, editors. Learning to Learn. Springer, 1998.
    Google ScholarFindings
  • [Tian et al., 2019] Pinzhuo Tian, Zhangkai Wu, Lei Qi, Lei Wang, Yinghuan Shi, and Yang Gao. Differentiable metalearning model for few-shot semantic segmentation. arXiv preprint arXiv:1911.10371, 2019.
    Findings
  • [Utgoff, 1986] Paul E Utgoff. Shift of bias for inductive concept learning. Machine learning: An artificial intelligence approach, 1986.
    Google ScholarFindings
  • [Vilalta and Drissi, 2002] Ricardo Vilalta and Youssef Drissi. A perspective view and survey of meta-learning. Artificial intelligence review, 18(2):77–95, 2002.
    Google ScholarLocate open access versionFindings
  • [Vinyals et al., 2016] Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al. Matching networks for one shot learning. In NIPS, pages 3630–3638, 2016.
    Google ScholarLocate open access versionFindings
  • [Xie et al., 2019] Yujia Xie, Haoming Jiang, Feng Liu, Tuo Zhao, and Hongyuan Zha. Meta learning with relational information for short sequences. In NeurIPS, pages 9901– 9912, 2019.
    Google ScholarLocate open access versionFindings
  • [Zinkevich, 2003] Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In ICML, pages 928–936, 2003.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments