Few-shot Autoregressive Density Estimation: Towards Learning to Learn Distributions

international conference on learning representations, 2018.

Cited by: 46|Bibtex|Views207|Links
EI
Keywords:
shot density estimationnatural imagemeta learningautoregressive density estimationimage mirroringMore(11+)
Weibo:
Comparing to several strong baselines, we showed that Attention PixelCNN achieves state-of-the-art results on Omniglot and promising results on natural images

Abstract:

Deep autoregressive models have shown state-of-the-art performance in density estimation for natural images on large-scale datasets such as ImageNet. However, such models require many thousands of gradient-based weight updates and unique image examples for training. Ideally, the models would rapidly learn visual concepts from only a handf...More

Code:

Data:

0
Introduction
  • Contemporary machine learning systems are still far behind humans in their ability to rapidly learn new visual concepts from only a few examples (Lake et al, 2013)
  • This setting, called few-shot learning, has been studied using deep neural networks and many other approaches in the context of discriminative models, for example Vinyals et al (2016); Santoro et al (2016).
  • The authors can add complexity in orthogonal directions to the generative model itself
Highlights
  • Contemporary machine learning systems are still far behind humans in their ability to rapidly learn new visual concepts from only a few examples (Lake et al, 2013)
  • Comparatively little attention has been devoted to the task of few-shot image density estimation; that is, the problem of learning a model of a probability distribution from a small number of examples
  • Autoregressive neural networks are useful for studying few-shot density estimation for several reasons
  • We show how attention can improve performance in the the few-shot density estimation problem by enabling the model to transmit texture information from the support set onto the target image canvas
  • Comparing to several strong baselines, we showed that Attention PixelCNN achieves state-of-the-art results on Omniglot and promising results on natural images
  • In the Meta PixelCNN model, we showed that recently proposed methods for gradient-based meta learning can be used for few-shot density estimation, and achieve state-of-the-art results in terms of likelihood on Omniglot
Methods
  • The authors describe experiments on image flipping, Omniglot, and Stanford Online Products.
  • The support set encoder f (s) has the following structure: in parallel over support images, a 5 × 5 conv layer, followed by a sequence of 3 × 3 convolutions and max-pooling until the spatial dimension is 1.
  • The support image encodings are concatenated and fed through two fully-connected layers to get the support set embedding.
  • The authors consider the problem of image flipping as few-shot learning.
  • The authors find that the Attention PixelCNN did learn to solve the task, interestingly, the baseline conditional PixelCNN and Meta PixelCNN did not
Results
  • The authors show how attention can improve performance in the the few-shot density estimation problem by enabling the model to transmit texture information from the support set onto the target image canvas.
  • Comparing to several strong baselines, the authors showed that Attention PixelCNN achieves state-of-the-art results on Omniglot and promising results on natural images
Conclusion
  • In this paper the authors adapted PixelCNN to the task of few-shot density estimation.
  • Comparing to several strong baselines, the authors showed that Attention PixelCNN achieves state-of-the-art results on Omniglot and promising results on natural images.
  • In the Meta PixelCNN model, the authors showed that recently proposed methods for gradient-based meta learning can be used for few-shot density estimation, and achieve state-of-the-art results in terms of likelihood on Omniglot
Summary
  • Introduction:

    Contemporary machine learning systems are still far behind humans in their ability to rapidly learn new visual concepts from only a few examples (Lake et al, 2013)
  • This setting, called few-shot learning, has been studied using deep neural networks and many other approaches in the context of discriminative models, for example Vinyals et al (2016); Santoro et al (2016).
  • The authors can add complexity in orthogonal directions to the generative model itself
  • Methods:

    The authors describe experiments on image flipping, Omniglot, and Stanford Online Products.
  • The support set encoder f (s) has the following structure: in parallel over support images, a 5 × 5 conv layer, followed by a sequence of 3 × 3 convolutions and max-pooling until the spatial dimension is 1.
  • The support image encodings are concatenated and fed through two fully-connected layers to get the support set embedding.
  • The authors consider the problem of image flipping as few-shot learning.
  • The authors find that the Attention PixelCNN did learn to solve the task, interestingly, the baseline conditional PixelCNN and Meta PixelCNN did not
  • Results:

    The authors show how attention can improve performance in the the few-shot density estimation problem by enabling the model to transmit texture information from the support set onto the target image canvas.
  • Comparing to several strong baselines, the authors showed that Attention PixelCNN achieves state-of-the-art results on Omniglot and promising results on natural images
  • Conclusion:

    In this paper the authors adapted PixelCNN to the task of few-shot density estimation.
  • Comparing to several strong baselines, the authors showed that Attention PixelCNN achieves state-of-the-art results on Omniglot and promising results on natural images.
  • In the Meta PixelCNN model, the authors showed that recently proposed methods for gradient-based meta learning can be used for few-shot density estimation, and achieve state-of-the-art results in terms of likelihood on Omniglot
Tables
  • Table1: Omniglot test(train) few-shot density estimation NLL in nats/dim. <a class="ref-link" id="cBornschein_et+al_2017_a" href="#rBornschein_et+al_2017_a">Bornschein et al (2017</a>) refers to Variational Memory Addressing and Gregor et al (2016) to ConvDRAW
  • Table2: Omniglot NLL in nats/pixel with four support examples. Attention Meta PixelCNN is a model combining attention with gradient-based weight updates for few-shot learning
Download tables as Excel
Related work
  • Learning to learn or meta-learning has been studied in cognitive science and machine learning for decades (Harlow, 1949; Thrun & Pratt, 1998; Hochreiter et al, 2001). In the context of modern deep networks, Andrychowicz et al (2016) learned a gradient descent optimizer by gradient descent, itself parameterized as a recurrent network. Chen et al (2017) showed how to learn to learn by gradient descent in the black-box optimization setting.

    Ravi & Larochelle (2017) showed the effectiveness of learning an optimizer in the few-shot learning setting. Finn et al (2017a) advanced a simplified yet effective variation in which the optimizer is not learned but rather fixed as one or a few steps of gradient descent, and the meta-learning problem reduces to learning an initial set of base parameters θ that can be adapted to minimize any task loss Lt by a single step of gradient descent, i.e. θ = θ − α∇Lt(θ). This approach was further shown to be effective in imitation learning including on real robotic manipulation tasks (Finn et al, 2017b). Shyam et al (2017) train a neural attentive recurrent comparator function to perform oneshot classification on Omniglot.
Funding
  • We show how attention can improve performance in the the few-shot density estimation problem by enabling the model to easily transmit texture information from the support set onto the target image canvas
  • Comparing to several strong baselines, we showed that Attention PixelCNN achieves state-of-the-art results on Omniglot and also promising results on natural images
Reference
  • Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, and Nando de Freitas. Learning to learn by gradient descent by gradient descent. 2016.
    Google ScholarFindings
  • Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
    Findings
  • S Bartunov and DP Vetrov. Fast adaptation in generative models with generative matching networks. arxiv preprint 1612.02192, 2016.
    Findings
  • Jorg Bornschein, Andriy Mnih, Daniel Zoran, and Danilo J. Rezende. Variational memory addressing in generative models. 2017.
    Google ScholarFindings
  • Yutian Chen, Matthew W. Hoffman, Sergio Gomez Colmenarejo, Misha Denil, Timothy P. Lillicrap, and Nando de Freitas. Learning to learn for global optimization of black box functions. In ICML, 2017.
    Google ScholarLocate open access versionFindings
  • Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, pp. 248–255, 2009.
    Google ScholarLocate open access versionFindings
  • Yan Duan, Marcin Andrychowicz, Bradly Stadie, Jonathan Ho, Jonas Schneider, Ilya Sutskever, Pieter Abbeel, and Wojciech Zaremba. One-shot imitation learning. arXiv preprint arXiv:1703.07326, 2017.
    Findings
  • Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. 2017a.
    Google ScholarFindings
  • Chelsea Finn, Tianhe Yu, Tianhao Zhang, Pieter Abbeel, and Sergey Levine. One-shot visual imitation learning via meta-learning. arXiv preprint arXiv:1709.04905, 2017b.
    Findings
  • Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N Dauphin. Convolutional sequence to sequence learning. arXiv preprint arXiv:1705.03122, 2017.
    Findings
  • Karol Gregor, Ivo Danihelka, Alex Graves, Danilo J. Rezende, and Daan Wierstra. Draw: A recurrent neural network for image generation. In Proceedings of The 32nd International Conference on Machine Learning, pp. 1462–1471, 2015.
    Google ScholarLocate open access versionFindings
  • Karol Gregor, Frederic Besse, Danilo J. Rezende, Ivo Danihelka, and Daan Wierstra. Towards conceptual compression. In Advances In Neural Information Processing Systems, pp. 3549–3557, 2016.
    Google ScholarLocate open access versionFindings
  • Harry F Harlow. The formation of learning sets. Psychological review, 56(1):51, 1949.
    Google ScholarLocate open access versionFindings
  • Brenden M Lake, Ruslan R Salakhutdinov, and Josh Tenenbaum. One-shot learning by inverting a compositional causal process. In NIPS, pp. 2526–2534, 2013.
    Google ScholarLocate open access versionFindings
  • Gergely Neu and Csaba Szepesvari. Apprenticeship learning using inverse reinforcement learning and gradient methods. arXiv preprint arXiv:1206.5264, 2012.
    Findings
  • Sachin Ravi and Hugo Larochelle. Optimization as a model for few-shot learning. 2016.
    Google ScholarFindings
  • Sachin Ravi and Hugo Larochelle. Optimization as a model for few-shot learning. In ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. Generative adversarial text-to-image synthesis. In ICML, pp. 1060–1069, 2016.
    Google ScholarLocate open access versionFindings
  • Scott E. Reed, Aaron van den Oord, Nal Kalchbrenner, Sergio Gomez, Ziyu Wang, Dan Belov, and Nando de Freitas. Parallel multiscale autoregressive density estimation. In ICML, 2017.
    Google ScholarLocate open access versionFindings
  • Danilo J. Rezende, Ivo Danihelka, Karol Gregor, Daan Wierstra, et al. One-shot generalization in deep generative models. In Proceedings of The 33rd International Conference on Machine Learning, pp. 1521–1529, 2016.
    Google ScholarLocate open access versionFindings
  • Adam Santoro, Sergey Bartunov, Matthew Botvinick, Daan Wierstra, and Timothy Lillicrap. Metalearning with memory-augmented neural networks. In ICML, 2016.
    Google ScholarLocate open access versionFindings
  • Pranav Shyam, Shubham Gupta, and Ambedkar Dukkipati. Attentive recurrent comparators. In ICML, 2017.
    Google ScholarLocate open access versionFindings
  • Linda Smith and Michael Gasser. The development of embodied cognition: Six lessons from babies. Artificial life, 11(1-2):13–29, 2005.
    Google ScholarLocate open access versionFindings
  • Hyun Oh Song, Yu Xiang, Stefanie Jegelka, and Silvio Savarese. Deep metric learning via lifted structured feature embedding. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
    Google ScholarLocate open access versionFindings
  • Elizabeth S Spelke and Katherine D Kinzler. Core knowledge. Developmental science, 10(1):89–96, 2007.
    Google ScholarLocate open access versionFindings
  • Sebastian Thrun and Lorien Pratt. Learning to learn. Springer Science & Business Media, 1998. Aaron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, and Koray
    Google ScholarFindings
  • Kavukcuoglu. Conditional image generation with PixelCNN decoders. In NIPS, 2016. Oriol Vinyals, Charles Blundell, Tim Lillicrap, Daan Wierstra, et al. Matching networks for one shot learning. In NIPS, 2016. Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich
    Google ScholarLocate open access versionFindings
  • Zemel, and Yoshua Bengio. Show, attend and tell: Neural image caption generation with visual attention. In International Conference on Machine Learning, pp. 2048–2057, 2015.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments