Automated Relational Meta-learning

ICLR, 2020.

Cited by: 5|Bibtex|Views95|Links
EI
Keywords:
meta-learning task heterogeneity meta-knowledge graph
Weibo:
To improve the effectiveness of meta-learning for handling heterogeneous task, we propose a new framework called automated relational meta-learning, which automatically extract relation across tasks and construct a meta-knowledge graph

Abstract:

In order to efficiently learn with small amount of data on new tasks, meta-learning transfers knowledge learned from previous tasks to the new ones. However, a critical challenge in meta-learning is the task heterogeneity which cannot be well handled by traditional globally shared meta-learning methods. In addition, current task-specific ...More
0
Introduction
  • Learning quickly is the key characteristic of human intelligence, which remains a daunting problem in machine intelligence.
  • The key challenge to deal with task heterogeneity is how to customize globally shared meta-learner by using task-specific information?
  • Learning the underlying structure across tasks provides a more effective way for balancing the customization and generalization.
  • Yao et al propose a hierarchically structured meta-learning method to customize the globally shared knowledge to each cluster (Yao et al, 2019b).
  • The hierarchical clustering structure completely relies on the handcrafted design which needs to be tuned carefully and may lack the capability to capture complex relationships
Highlights
  • Learning quickly is the key characteristic of human intelligence, which remains a daunting problem in machine intelligence
  • The mechanism of meta-learning is widely used to generalize and transfer prior knowledge learned from previous tasks to improve the effectiveness of learning on new tasks, which has benefited various applications, such as computer vision (Kang et al, 2019; Liu et al, 2019), natural language processing (Gu et al, 2018; Lin et al, 2019) and social good (Zhang et al, 2019; Yao et al, 2019a)
  • Globally shared meta-learners fail to handle tasks lying in different distributions, which is known as task heterogeneity (Vuorio et al, 2018; Yao et al, 2019b)
  • The key challenge to deal with task heterogeneity is how to customize globally shared meta-learner by using task-specific information? Recently, a handful of works try to solve the problem by learning a task-specific representation for tailoring the transferred knowledge to each task (Oreshkin et al, 2018; Vuorio et al, 2018; Lee & Choi, 2018)
  • We can observe that automated relational meta-learning achieves the best performance as compared to competitive gradient-based meta-learning methods, i.e., globally shared models and task-specific models
  • To improve the effectiveness of meta-learning for handling heterogeneous task, we propose a new framework called automated relational meta-learning, which automatically extract relation across tasks and construct a meta-knowledge graph
Methods
Results
  • Results and Analysis

    In Figure 2, the authors summarize the interpretation of meta-knowledge graph and the the qualitative results of 10-shot 2D regression.
  • The authors can observe that ARML achieves the best performance as compared to competitive gradient-based meta-learning methods, i.e., globally shared models and task-specific models.
  • For miniimagenet and tieredImagenet, since it do not have the characteristic of task heterogeneity, the authors show the results in Table 3 and Table 4, respectively
  • In this table, the authors compare the model with other gradient-based meta-learning models.
  • The results of ablation study on 5-way, 5-shot scenario for Art-Multi and PlainMulti datasets are presented in Table 5 and Table 6, respectively.
  • Compared with ablation I, II and III, the better performance of ARML shows that structuring samples can (1) better handling the underlying relations (2) alleviating the effect of potential anomalies by structuring samples as prototypes
Conclusion
  • To improve the effectiveness of meta-learning for handling heterogeneous task, the authors propose a new framework called ARML, which automatically extract relation across tasks and construct a meta-knowledge graph.
  • When a new task comes in, it can quickly find the most relevant relations through the meta-knowledge graph and use this knowledge to facilitate its training process.
  • The authors plan to investigate the problem in the following directions: (1) the authors are interested to investigate the more explainable semantic meaning in the meta-knowledge graph on this problem; (2).
Summary
  • Introduction:

    Learning quickly is the key characteristic of human intelligence, which remains a daunting problem in machine intelligence.
  • The key challenge to deal with task heterogeneity is how to customize globally shared meta-learner by using task-specific information?
  • Learning the underlying structure across tasks provides a more effective way for balancing the customization and generalization.
  • Yao et al propose a hierarchically structured meta-learning method to customize the globally shared knowledge to each cluster (Yao et al, 2019b).
  • The hierarchical clustering structure completely relies on the handcrafted design which needs to be tuned carefully and may lack the capability to capture complex relationships
  • Methods:

    Methods for Comparison

    The authors compare the proposed ARML with two types of baselines: (1) Gradient-based meta-learning methods: both globally shared methods (MAML (Finn et al, 2017), Meta-SGD (Li et al, 2017)) and task-specific methods (MT-Net (Lee & Choi, 2018), MUMOMAML (Vuorio et al, 2018), HSML (Yao et al, 2019b), BMAML (Yoon et al, 2018)) are considered for comparison. (2) Other meta-learning methods: the authors select globally shared methods VERSA (Gordon et al, 2019), Prototypical Network (ProtoNet) (Snell et al, 2017), TapNet (Yoon et al, 2019) and task-specific method TADAM (Oreshkin et al, 2018) as baselines.
  • (2) Other meta-learning methods: the authors select globally shared methods VERSA (Gordon et al, 2019), Prototypical Network (ProtoNet) (Snell et al, 2017), TapNet (Yoon et al, 2019) and task-specific method TADAM (Oreshkin et al, 2018) as baselines.
  • Detailed implementations of baselines are discussed in Appendix B.3.
  • More detailed discussion about experiment settings are presented in Appendix B
  • Results:

    Results and Analysis

    In Figure 2, the authors summarize the interpretation of meta-knowledge graph and the the qualitative results of 10-shot 2D regression.
  • The authors can observe that ARML achieves the best performance as compared to competitive gradient-based meta-learning methods, i.e., globally shared models and task-specific models.
  • For miniimagenet and tieredImagenet, since it do not have the characteristic of task heterogeneity, the authors show the results in Table 3 and Table 4, respectively
  • In this table, the authors compare the model with other gradient-based meta-learning models.
  • The results of ablation study on 5-way, 5-shot scenario for Art-Multi and PlainMulti datasets are presented in Table 5 and Table 6, respectively.
  • Compared with ablation I, II and III, the better performance of ARML shows that structuring samples can (1) better handling the underlying relations (2) alleviating the effect of potential anomalies by structuring samples as prototypes
  • Conclusion:

    To improve the effectiveness of meta-learning for handling heterogeneous task, the authors propose a new framework called ARML, which automatically extract relation across tasks and construct a meta-knowledge graph.
  • When a new task comes in, it can quickly find the most relevant relations through the meta-knowledge graph and use this knowledge to facilitate its training process.
  • The authors plan to investigate the problem in the following directions: (1) the authors are interested to investigate the more explainable semantic meaning in the meta-knowledge graph on this problem; (2).
Tables
  • Table1: Overall few-shot classification results (accuracy ± 95% confidence) on Plain-Multi dataset
  • Table2: Overall few-shot classification results (accuracy ± 95% confidence) on Art-Multi dataset
  • Table3: Performance comparison on the 5-way, 1-shot MiniImagenet dataset
  • Table4: Performance comparison on the 5-way, 1-shot tieredImagenet dataset
  • Table5: Full evaluation results of model ablation study on Art-Multi dataset. B, T, A, F represent bird, texture, aircraft, fungi, respectively. Plain means original image
  • Table6: Results of Model Ablation (5-way, 5-shot results) on Plain-Multi dataset
  • Table7: Full evaluation results of performance v.s. # vertices of meta-knowledge graph on Art-Multi. B, T, A, F represent bird, texture, aircraft, fungi, respectively. Plain means original image
  • Table8: Full results on Art-Multi dataset. In this table, B, T, A, F represent bird, texture, aircraft, fungi, respectively. Plain means original image
Download tables as Excel
Related work
  • Meta-learning designs models to learn new tasks or adapt to new environments quickly with a few training examples. There are mainly three research lines of meta-learning: (1) black-box amortized methods design black-box meta-learners to infer the model parameters (Ravi & Larochelle, 2016; Andrychowicz et al, 2016; Mishra et al, 2018; Gordon et al, 2019); (2) gradient-based methods aim to learn an optimized initialization of model parameters, which can be adapted to new tasks by a few steps of gradient descent (Finn et al, 2017; 2018; Lee & Choi, 2018; Yoon et al, 2018; Grant et al, 2018); (3) non-parametric methods combine parametric meta-learners and non-parametric learners to learn an appropriate distance metric for few-shot classification (Snell et al, 2017; Vinyals et al, 2016; Yang et al, 2018; Oreshkin et al, 2018; Yoon et al, 2019; Garcia & Bruna, 2017).

    Our work is built upon the gradient-based meta-learning methods. In the line of gradient-based meta-learning, most algorithms learn a globally shared meta-learners from previous tasks (Finn et al, 2017; Li et al, 2017; Flennerhag et al, 2019), to improve the effectiveness of learning process on new tasks. However, these algorithms typically lack the ability to handle heterogeneous tasks (i.e., tasks sample from sufficient different distributions). To tackle this challenge, recent works tailor the globally shared initialization to different tasks by customizing initialization (Vuorio et al, 2018; Yao et al, 2019b) and using probabilistic models (Yoon et al, 2018; Finn et al, 2018). Representatively, HSML customizes the globally shared initialization with a manually designed hierarchical clustering structure to balance the generalization and customization (Yao et al, 2019b). However, the handcrafted designed hierarchical structure may not accurately reflect the real structure and the clustering structure constricts the complexity of relationship. Compared with these methods, ARML leverages the most relevant structure from the automatically constructed meta-knowledge graph. Thus, ARML not only discovers more accurate underlying structures to improve the effectiveness of meta-learning algorithms, but also the meta-knowledge graph further enhances the model interpretability.
Funding
  • The work was supported in part by NSF awards #1652525 and #1618448
  • The views and conclusions contained in this paper are those of the authors and should not be interpreted as representing any funding agencies
Reference
  • Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, and Nando De Freitas. Learning to learn by gradient descent by gradient descent. In NeurIPS, pp. 3981–3989, 2016.
    Google ScholarLocate open access versionFindings
  • Chelsea Finn and Sergey Levine. Meta-learning and universality: Deep representations and gradient descent can approximate any learning algorithm. In ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In ICML, pp. 1126–1135, 2017.
    Google ScholarLocate open access versionFindings
  • Chelsea Finn, Kelvin Xu, and Sergey Levine. Probabilistic model-agnostic meta-learning. In NeurIPS, 2018.
    Google ScholarLocate open access versionFindings
  • Sebastian Flennerhag, Pablo G Moreno, Neil D Lawrence, and Andreas Damianou. Transferring knowledge across learning processes. ICLR, 2019.
    Google ScholarLocate open access versionFindings
  • Victor Garcia and Joan Bruna. Few-shot learning with graph neural networks. In ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. In ICML, pp. 1263–1272. JMLR. org, 2017.
    Google ScholarLocate open access versionFindings
  • Jonathan Gordon, John Bronskill, Matthias Bauer, Sebastian Nowozin, and Richard E Turner. Metalearning probabilistic inference for prediction. In ICLR, 2019.
    Google ScholarLocate open access versionFindings
  • Erin Grant, Chelsea Finn, Sergey Levine, Trevor Darrell, and Thomas Griffiths. Recasting gradientbased meta-learning as hierarchical bayes. In ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • Jiatao Gu, Yong Wang, Yun Chen, Kyunghyun Cho, and Victor OK Li. Meta-learning for low-resource neural machine translation. In EMNLP, 2018.
    Google ScholarLocate open access versionFindings
  • Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. In NeurIPS, pp. 1024–1034, 2017.
    Google ScholarLocate open access versionFindings
  • Ghassen Jerfel, Erin Grant, Thomas L Griffiths, and Katherine Heller. Reconciling meta-learning and continual learning with online mixtures of tasks. NeurIPS, 2019.
    Google ScholarLocate open access versionFindings
  • Bingyi Kang, Zhuang Liu, Xin Wang, Fisher Yu, Jiashi Feng, and Trevor Darrell. Few-shot object detection via feature reweighting. In ICCV, 2019.
    Google ScholarLocate open access versionFindings
  • Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. In ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • Yoonho Lee and Seungjin Choi. Gradient-based meta-learning with learned layerwise metric and subspace. In ICML, pp. 2933–2942, 2018.
    Google ScholarLocate open access versionFindings
  • Zhenguo Li, Fengwei Zhou, Fei Chen, and Hang Li. Meta-sgd: Learning to learn quickly for few shot learning. arXiv preprint arXiv:1707.09835, 2017.
    Findings
  • Zhaojiang Lin, Andrea Madotto, Chien-Sheng Wu, and Pascale Fung. Personalizing dialogue agents via meta-learning. 2019.
    Google ScholarFindings
  • Ming-Yu Liu, Xun Huang, Arun Mallya, Tero Karras, Timo Aila, Jaakko Lehtinen, and Jan Kautz. Few-shot unsupervised image-to-image translation. arXiv preprint arXiv:1905.01723, 2019.
    Findings
  • Nikhil Mishra, Mostafa Rohaninejad, Xi Chen, and Pieter Abbeel. A simple neural attentive metalearner. ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • Alex Nichol and John Schulman. Reptile: a scalable metalearning algorithm. arXiv preprint arXiv:1803.02999, 2018.
    Findings
  • Boris Oreshkin, Pau Rodrıguez Lopez, and Alexandre Lacoste. Tadam: Task dependent adaptive metric for improved few-shot learning. In NeurIPS, pp. 721–731, 2018.
    Google ScholarLocate open access versionFindings
  • Ethan Perez, Florian Strub, Harm de Vries, Vincent Dumoulin, and Aaron C. Courville. Film: Visual reasoning with a general conditioning layer. In AAAI, 2018.
    Google ScholarLocate open access versionFindings
  • Sachin Ravi and Hugo Larochelle. Optimization as a model for few-shot learning. ICLR, 2016.
    Google ScholarLocate open access versionFindings
  • Mengye Ren, Eleni Triantafillou, Sachin Ravi, Jake Snell, Kevin Swersky, Joshua B. Tenenbaum, Hugo Larochelle, and Richard S. Zemel. Meta-learning for semi-supervised few-shot classification. In ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • Andrei A Rusu, Dushyant Rao, Jakub Sygnowski, Oriol Vinyals, Razvan Pascanu, Simon Osindero, and Raia Hadsell. Meta-learning with latent embedding optimization. In ICLR, 2019.
    Google ScholarLocate open access versionFindings
  • Jake Snell, Kevin Swersky, and Richard Zemel. Prototypical networks for few-shot learning. In NeurIPS, pp. 4077–4087, 2017.
    Google ScholarLocate open access versionFindings
  • Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks. In ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al. Matching networks for one shot learning. In NeurIPS, pp. 3630–3638, 2016.
    Google ScholarLocate open access versionFindings
  • Risto Vuorio, Shao-Hua Sun, Hexiang Hu, and Joseph J Lim. Toward multimodal model-agnostic meta-learning. arXiv preprint arXiv:1812.07172, 2018.
    Findings
  • Xin Wang, Fisher Yu, Ruth Wang, Trevor Darrell, and Joseph E Gonzalez. Tafe-net: Task-aware feature embeddings for low shot learning. In CVPR, pp. 1831–1840, 2019.
    Google ScholarLocate open access versionFindings
  • Flood Sung Yongxin Yang, Li Zhang, Tao Xiang, Philip HS Torr, and Timothy M Hospedales. Learning to compare: Relation network for few-shot learning. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Huaxiu Yao, Yiding Liu, Ying Wei, Xianfeng Tang, and Zhenhui Li. Learning from multiple cities: A meta-learning approach for spatial-temporal prediction. In WWW. ACM, 2019a.
    Google ScholarFindings
  • Huaxiu Yao, Ying Wei, Junzhou Huang, and Zhenhui Li. Hierarchically structured meta-learning. In ICML, pp. 7045–7054, 2019b.
    Google ScholarLocate open access versionFindings
  • Jaesik Yoon, Taesup Kim, Ousmane Dia, Sungwoong Kim, Yoshua Bengio, and Sungjin Ahn. Bayesian model-agnostic meta-learning. In NeurIPS, pp. 7343–7353, 2018.
    Google ScholarLocate open access versionFindings
  • Sung Whan Yoon, Jun Seo, and Jaekyun Moon. Tapnet: Neural network augmented with task-adaptive projection for few-shot learning. In ICML, 2019.
    Google ScholarLocate open access versionFindings
  • Xi Sheryl Zhang, Fengyi Tang, Hiroko Dodge, Jiayu Zhou, and Fei Wang. Metapred: Meta-learning for clinical risk prediction with limited patient electronic health records. In KDD, 2019.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments