Learning Task-aware Local Representations for Few-shot Learning

IJCAI, pp. 716-722, 2020.

Cited by: 0|Bibtex|Views42|Links
EI
Keywords:
local representationsatl netentire taskTask-aware Local Representations Networkvisual recognitionMore(10+)
Weibo:
Note that the proposed ATL-Net achieves an improvement of 3.06%/2.20% under 5-way 1-shot/5-shot settings than the most relevant work, which exploits the relation at the class-level by k-nearest neighbor selection

Abstract:

Few-shot learning for visual recognition aims to adapt to novel unseen classes with only a few images. Recent work, especially the work based on low-level information, has achieved great progress. In these work, local representations (LRs) are typically employed, because LRs are more consistent among the seen and unseen classes. However, ...More

Code:

Data:

0
Introduction
  • Deep learning based methods [Krizhevsky et al, 2012; He et al, 2016] have achieved state-of-the-art performance on a variety of visual recognition tasks
  • These supervised methods need a lot of labeled data with diverse visual variations to effectively train a network.
  • For machine learning algorithms, how to recognize classes with extremely few labeled examples, i.e., few-shot learning, has attracted a lot of interests.
Highlights
  • Deep learning based methods [Krizhevsky et al, 2012; He et al, 2016] have achieved state-of-the-art performance on a variety of visual recognition tasks
  • We propose a novel episodic attention mechanism by exploring and weighting discriminative semantic patches inside the entire task, aiming to learn task-aware local representations for few-shot learning
  • Note that the proposed ATL-Net achieves an improvement of 3.06%/2.20% under 5-way 1-shot/5-shot settings than the most relevant work [Li et al, 2019b], which exploits the relation at the class-level by k-nearest neighbor selection
  • To verify the superiority of value-based selection than rankbased one [Li et al, 2019b], we replace the adaptive attention module by a k-nearest neighbor selection, the peak and surrounding results under 5-way 5-shot setting are reported in the Figure 3
  • We propose an Adaptive Task-aware Local Representation Network (ATL-Net) for few-shot learning, aiming to learn more discriminative local representations by taking a view of the entire task
  • An adaptive episodic attention mechanism is designed to adaptively select the key semantic patches for a special task, without distracting attention by the common parts shared by most classes
Methods
Results
  • Results on miniImagenet

    The results on miniImagenet are summarized in Table 2.
  • The authors achieve 54.30% under the 5-way 1-shot setting, with an improvement of 0.93% from the second best [Cai et al, 2018].
  • Compared with [Cai et al, 2018], the proposed ATL-Net introduces simpler additional structures (i.e., FΨ and FΓ) than the complex memory-addressing architectures.
  • Note that the proposed ATL-Net achieves an improvement of 3.06%/2.20% under 5-way 1-shot/5-shot settings than the most relevant work [Li et al, 2019b], which exploits the relation at the class-level by k-NN selection
Conclusion
  • To verify the superiority of value-based selection than rankbased one [Li et al, 2019b], the authors replace the adaptive attention module by a k-NN selection, the peak and surrounding results under 5-way 5-shot setting are reported in the Figure 3.
  • The authors observe that it’s difficult for the rank-based method to select enough corresponding LRs without too much noise at the task-level.
  • Extensive experimental results on the benchmarks verify the effectiveness and superiority of the proposed ATL-Net
Summary
  • Introduction:

    Deep learning based methods [Krizhevsky et al, 2012; He et al, 2016] have achieved state-of-the-art performance on a variety of visual recognition tasks
  • These supervised methods need a lot of labeled data with diverse visual variations to effectively train a network.
  • For machine learning algorithms, how to recognize classes with extremely few labeled examples, i.e., few-shot learning, has attracted a lot of interests.
  • Methods:

    4.1 Datasets miniImageNet [Vinyals et al, 2016] is a subset of ImageNet [Deng et al, 2009], which consists of 100 classes and 600 images per class.
  • The authors evaluate the method on three fine-grained image classification datasets.
  • Stanford Dogs [Khosla et al, 2011] contains 120 categories with a total number of 20, 580 images.
  • Stanford Cars [Krause et al, 2013] contains 196 classes of cars and 16, 185 images.
  • CUB-200 [Welinder et al, 2010] contains 200 bird species with a total number of 6, 033 images.
  • The authors use the data splits of [Li et al, 2019b; Li et al, 2019c; Huang et al, 2019], as Table 1 shows
  • Results:

    Results on miniImagenet

    The results on miniImagenet are summarized in Table 2.
  • The authors achieve 54.30% under the 5-way 1-shot setting, with an improvement of 0.93% from the second best [Cai et al, 2018].
  • Compared with [Cai et al, 2018], the proposed ATL-Net introduces simpler additional structures (i.e., FΨ and FΓ) than the complex memory-addressing architectures.
  • Note that the proposed ATL-Net achieves an improvement of 3.06%/2.20% under 5-way 1-shot/5-shot settings than the most relevant work [Li et al, 2019b], which exploits the relation at the class-level by k-NN selection
  • Conclusion:

    To verify the superiority of value-based selection than rankbased one [Li et al, 2019b], the authors replace the adaptive attention module by a k-NN selection, the peak and surrounding results under 5-way 5-shot setting are reported in the Figure 3.
  • The authors observe that it’s difficult for the rank-based method to select enough corresponding LRs without too much noise at the task-level.
  • Extensive experimental results on the benchmarks verify the effectiveness and superiority of the proposed ATL-Net
Tables
  • Table1: The splits of three fine-grained datasets. Nall is the total number of classes. Ntrain, Nval and Ntest indicate the number of classes in training (auxiliary) set, validation set and test set
  • Table2: Comparisons with other methods on miniImagenet. The second column shows which kind of embedding module is employed. The third column denotes whether the model contains additional training stage, e.g. pretrain stage or fine-tune stage. We use the officially provided results for all the other methods. For each setting, the best and the second best results are highlighted
  • Table3: Comparisons with other methods on three fine-grained datasets. We adopt the results from [<a class="ref-link" id="cLi_et+al_2019_c" href="#rLi_et+al_2019_c">Li et al, 2019c</a>] for the first three methods and the officially provided results for the other methods. For each setting, the best and the second best results are highlighted
  • Table4: Ablation study on miniImagenet for the proposed ATL-Net
  • Table5: The number of trainable parameters along with 5-way 5shot performance of different models
Download tables as Excel
Related work
  • The recent literature of few-shot learning mainly comes from the following two categories: meta-learning based methods and metric-learning based methods.

    2.1 Meta-learning based Methods

    Meta-learning based methods learn the learning algorithm itself. [Santoro et al, 2016] proposes an LSTM-based metalearner to interact with an external memory module. The proposed framework in [Santoro et al, 2016] adopts an LSTMbased meta-learner to learn a distinct optimization algorithm to train a classifier as well as learning a task-aware initialization for this classifier. MAML and its variants [Finn et al, 2017] train a meta-leaner to provide suitable parameter initialization, so that they can be quickly adapted to a novel task. Similarly, [Li et al, 2017] adjusts the update direction and learning rate for quickly adapting to a novel task. [Cai et al, 2018] introduces the memory slots to construct a contextual learner for predicting the parameters of an embedding module for unlabeled images.

    Nevertheless, these methods often need costly higher-order gradients or need another complicated memory structure, making these methods difficult to train and may lead to failure when scaling to deeper network architectures [Mishra et al, 2018]. Compared with methods in this branch, the proposed ATL-Net can achieve competitive results with a much simpler network architecture, which is trained from scratch without fine-tuning.
Funding
  • This work is supported by National Key R&D Program of China (2018YFB1402600), NSFC (61806092) and Jiangsu Natural Science Foundation (No BK20180326)
Reference
  • [Cai et al., 2018] Qi Cai, Yingwei Pan, Ting Yao, Chenggang Yan, and Tao Mei. Memory matching networks for one-shot image recognition. In CVPR, pages 4080–4088, 2018.
    Google ScholarLocate open access versionFindings
  • [Chu et al., 2019] Wen-Hsuan Chu, Yu-Jhe Li, Jing-Cheng Chang, and Yu-Chiang Frank Wang. Spot and learn: A maximum-entropy patch sampler for few-shot image classification. In CVPR, pages 6251–6260, 2019.
    Google ScholarLocate open access versionFindings
  • [Deng et al., 2009] Jia Deng, Wei Dong, Richard Socher, LiJia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, pages 248–255.
    Google ScholarLocate open access versionFindings
  • [Finn et al., 2017] Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In ICML, pages 1126–1135. JMLR. org, 2017.
    Google ScholarLocate open access versionFindings
  • [He et al., 2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
    Google ScholarLocate open access versionFindings
  • [Huang et al., 2019] Huaxi Huang, Junjie Zhang, Jian Zhang, Jingsong Xu, and Qiang Wu. Low-rank pairwise alignment bilinear network for few-shot fine-grained image classification. arXiv preprint arXiv:1908.01313, 2019.
    Findings
  • [Khosla et al., 2011] Aditya Khosla, Nityananda Jayadevaprakash, Bangpeng Yao, and Fei-Fei Li. Novel dataset for fine-grained image categorization: Stanford dogs. In Proc. CVPR Workshop on Fine-Grained Visual Categorization (FGVC), 2011.
    Google ScholarLocate open access versionFindings
  • [Kingma and Ba, 2015] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In Yoshua Bengio and Yann LeCun, editors, ICLR, 2015.
    Google ScholarLocate open access versionFindings
  • [Koch et al., 2015] Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov. Siamese neural networks for one-shot image recognition. In ICML deep learning workshop, volume 2, 2015.
    Google ScholarLocate open access versionFindings
  • [Krause et al., 2013] Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 3d object representations for finegrained categorization. In ICCV Workshops, pages 554– 561, 2013.
    Google ScholarLocate open access versionFindings
  • [Krizhevsky et al., 2012] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In NeurIPS, pages 1097– 1105, 2012.
    Google ScholarLocate open access versionFindings
  • [Li et al., 2017] Zhenguo Li, Fengwei Zhou, Fei Chen, and Hang Li. Meta-sgd: Learning to learn quickly for few-shot learning. arXiv preprint arXiv:1707.09835, 2017.
    Findings
  • [Li et al., 2019a] Aoxue Li, Tiange Luo, Tao Xiang, Weiran Huang, and Liwei Wang. Few-shot learning with global class representations. In ICCV, pages 9715–9724, 2019.
    Google ScholarLocate open access versionFindings
  • [Li et al., 2019b] Wenbin Li, Lei Wang, Jinglin Xu, Jing Huo, Yang Gao, and Jiebo Luo. Revisiting local descriptor based image-to-class measure for few-shot learning. In CVPR, pages 7260–7268, 2019.
    Google ScholarLocate open access versionFindings
  • [Li et al., 2019c] Wenbin Li, Jinglin Xu, Jing Huo, Lei Wang, Yang Gao, and Jiebo Luo. Distribution consistency based covariance metric networks for few-shot learning. In AAAI, volume 33, pages 8642–8649, 2019.
    Google ScholarLocate open access versionFindings
  • [Mishra et al., 2018] Nikhil Mishra, Mostafa Rohaninejad, Xi Chen, and Pieter Abbeel. A simple neural attentive meta-learner. In ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • [Paszke et al., 2019] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, highperformance deep learning library. In NeurIPS, pages 8024–8035, 2019.
    Google ScholarLocate open access versionFindings
  • [Santoro et al., 2016] Adam Santoro, Sergey Bartunov, Matthew Botvinick, Daan Wierstra, and Timothy Lillicrap. Meta-learning with memory-augmented neural networks. In ICML, pages 1842–1850, 2016.
    Google ScholarLocate open access versionFindings
  • [Satorras and Estrach, 2018] Victor Garcia Satorras and Joan Bruna Estrach. Few-shot learning with graph neural networks. In ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • [Snell et al., 2017] Jake Snell, Kevin Swersky, and Richard Zemel. Prototypical networks for few-shot learning. In NeurIPS, pages 4077–4087, 2017.
    Google ScholarLocate open access versionFindings
  • [Sun et al., 2018] Ming Sun, Yuchen Yuan, Feng Zhou, and Errui Ding. Multi-attention multi-class constraint for finegrained image recognition. In ECCV, pages 805–821, 2018.
    Google ScholarLocate open access versionFindings
  • [Sung et al., 2018] Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip HS Torr, and Timothy M Hospedales. Learning to compare: Relation network for few-shot learning. In CVPR, pages 1199–1208, 2018.
    Google ScholarLocate open access versionFindings
  • [Vinyals et al., 2016] Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al. Matching networks for one shot learning. In NeurIPS, pages 3630–3638, 2016.
    Google ScholarLocate open access versionFindings
  • [Wang et al., 2018] Yu-Xiong Wang, Ross Girshick, Martial Hebert, and Bharath Hariharan. Low-shot learning from imaginary data. In CVPR, pages 7278–7286, 2018.
    Google ScholarLocate open access versionFindings
  • [Welinder et al., 2010] P. Welinder, S. Branson, T. Mita, C. Wah, F. Schroff, S. Belongie, and P. Perona. CaltechUCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology, 2010.
    Google ScholarFindings
  • [Zhang et al., 2018] Ruixiang Zhang, Tong Che, Zoubin Ghahramani, Yoshua Bengio, and Yangqiu Song. Metagan: An adversarial approach to few-shot learning. In NeurIPS, pages 2365–2374, 2018.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments