Bayesian active learning for posterior estimation

IJCAI'15 Proceedings of the 24th International Conference on Artificial Intelligence, pp. 3605-3611, 2015.

Cited by: 21|Bibtex|Views182|Links
EI
Keywords:
Bayesian quadraturecosmological parameterRandom Samplesmachine learninggaussian processMore(17+)
Weibo:
We proposed a framework for query efficient posterior estimation for expensive blackbox likelihood evaluations

Abstract:

This paper studies active posterior estimation in a Bayesian setting when the likelihood is expensive to evaluate. Existing techniques for posterior estimation are based on generating samples representative of the posterior. Such methods do not consider efficiency in terms of likelihood evaluations. In order to be query efficient we treat...More

Code:

Data:

Introduction
  • Computing the posterior distribution of parameters given observations is a central problem in statistics.
  • The authors only have access to a black box which computes the likelihood for a given value of the parameters.
  • Physicists have developed simulation-based probability models of the Universe which can be used to compute the likelihood of cosmological parameters for a given observation.
  • Expensive simulators in molecular mechanics, computational biology and neuroscience are used to model many scientific processes
Highlights
  • Computing the posterior distribution of parameters given observations is a central problem in statistics
  • Our implementation uses Gaussian processes (GP) [Rasmussen and Williams, 2006] and we demonstrate the efficacy of the methods on multiple synthetic and real experiments
  • Where P At−1∪{(θ+,L(θ+))} is our estimate of the posterior using At−1 ∪ {(θ+, L(θ+))}. This objective is not accessible in practice, since we know neither Pθ|Xobs nor L(θ+). As surrogates to this ideal objective in Equation (3), in the following subsections we propose two utility functions for determining the point: Negative Expected Divergence (NED) and Exponentiated Variance (EV)
  • In our experiments we found that both Exponentiated Variance and Negative Expected Divergence performed well
  • We proposed a framework for query efficient posterior estimation for expensive blackbox likelihood evaluations
  • Our work demonstrates that when likelihood evaluations are expensive, such
Methods
  • The authors first look at a series of low and high dimensional synthetic and real astrophysical experiments.
  • NED is only tested on low dimensional problems since empirical approximation and numerical integration is computationally expensive in high dimensions.
  • The bandwidth for the kernel was set to be 5n−1/d where n is the total number of queries and d is the dimension.
  • This was following several kernel methods (such as kernel
Conclusion
  • The authors proposed a framework for query efficient posterior estimation for expensive blackbox likelihood evaluations.
  • The authors demonstrate that the methods outperform natural alternatives in practice.
  • Note that in Machine Learning it is uncommon to treat posterior estimation in a regression setting.
  • This is probably since the estimate will depend on the intricacies of the regression algorithm.
  • The authors' work demonstrates that when likelihood evaluations are expensive, such
Summary
  • Introduction:

    Computing the posterior distribution of parameters given observations is a central problem in statistics.
  • The authors only have access to a black box which computes the likelihood for a given value of the parameters.
  • Physicists have developed simulation-based probability models of the Universe which can be used to compute the likelihood of cosmological parameters for a given observation.
  • Expensive simulators in molecular mechanics, computational biology and neuroscience are used to model many scientific processes
  • Methods:

    The authors first look at a series of low and high dimensional synthetic and real astrophysical experiments.
  • NED is only tested on low dimensional problems since empirical approximation and numerical integration is computationally expensive in high dimensions.
  • The bandwidth for the kernel was set to be 5n−1/d where n is the total number of queries and d is the dimension.
  • This was following several kernel methods (such as kernel
  • Conclusion:

    The authors proposed a framework for query efficient posterior estimation for expensive blackbox likelihood evaluations.
  • The authors demonstrate that the methods outperform natural alternatives in practice.
  • Note that in Machine Learning it is uncommon to treat posterior estimation in a regression setting.
  • This is probably since the estimate will depend on the intricacies of the regression algorithm.
  • The authors' work demonstrates that when likelihood evaluations are expensive, such
Related work
  • Practitioners have conventionally used sampling schemes [MacKay, 2003] to approximate the posterior distributions. Rejection sampling and various MCMC methods are common choices. The advantage of MCMC approaches is their theoretical guarantees with large sample sets [Robert and Casella, 2005] and thus they are a good choice when likelihood evaluations are cheap. However, none of them is intended to be query efficient when evaluations are expensive. Some methods spend most of their computation evaluating point likelihoods and then discard the likelihood values after doing an acceptance test. This gives insight into the potential gains possible by retaining those likelihoods for use in regression. Despite such deficiencies, MCMC remains one of the most popular techniques for posterior estimation in experimental science [Foreman-Mackey et al, 2013; Parkinson et al, 2006; Landau and Binder, 2005; Liu, 2001].
Funding
  • This research was partly funded by DOE grant DESC0011114
Reference
  • Ryan Prescott Adams, Iain Murray, and David J. C. MacKay. The Gaussian Process Density Sampler. In NIPS, 2008.
    Google ScholarLocate open access versionFindings
  • Eric Brochu, Vlad M. Cora, and Nando de Freitas. A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning. CoRR, 2010.
    Google ScholarFindings
  • Brent Bryan, Jeff Schneider, Robert Nichol, Christopher Miller, Christopher Genovese, and Larry Wasserman. Active learning for identifying function threshold boundaries. In Advances in Neural Information Processing Systems. MIT Press, Cambridge, MA, 2006.
    Google ScholarLocate open access versionFindings
  • T. M. Davis et al. Scrutinizing Exotic Cosmological Models Using ESSENCE Supernova Data Combined with Other Cosmological Probes. The Astrophysical Journal, pages 716–725, 2007.
    Google ScholarLocate open access versionFindings
  • Daniel Foreman-Mackey, David W. Hogg, Dustin Lang, and Jonathan Goodman. emcee: The MCMC Hammer, January 2013.
    Google ScholarFindings
  • Kenji Fukumizu, Le Song, and Arthur Gretton. Kernel Bayes’ Rule: Bayesian Inference with Positive Definite Kernels. Journal of Machine Learning Research, 2014.
    Google ScholarLocate open access versionFindings
  • Daniel Golovin and Andreas Krause. Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization. Journal of Artificial Intelligence Research (JAIR), 2011.
    Google ScholarLocate open access versionFindings
  • Alkis Gotovos, Nathalie Casati, Gregory Hitz, and Andreas Krause. Active Learning for Level Set Estimation. In IJCAI 2013, Proceedings of the 23rd International Joint Conference on Artificial Intelligence, Beijing, China, August 3-9, 2013, 2013.
    Google ScholarLocate open access versionFindings
  • Tom Gunter, Michael A. Osborne, Roman Garnett, Philipp Hennig, and Stephen J. Roberts. Sampling for Inference in Probabilistic Models with Fast Bayesian Quadrature. In Advances in Neural Information Processing Systems, 2014.
    Google ScholarLocate open access versionFindings
  • Laszlo Gyorfi, Micael Kohler, Adam Krzyzak, and Harro Walk. A Distribution Free Theory of Nonparametric Regression. Springer Series in Statistics, 2002.
    Google ScholarLocate open access versionFindings
  • Kirthevasan Kandasamy, Jeff Schneider, and Barnabas Poczos. High Dimensional Bayesian Optimisation and Bandits via Additive Models. In International Conference on Machine Learning, 2015.
    Google ScholarLocate open access versionFindings
  • Andreas Krause, Ajit Singh, and Carlos Guestrin. NearOptimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies. J. Mach. Learn. Res., 2008.
    Google ScholarLocate open access versionFindings
  • David Landau and Kurt Binder. A Guide to Monte Carlo Simulations in Statistical Physics. Cambridge University Press, 2005.
    Google ScholarFindings
  • Jun S. Liu. Monte Carlo strategies in Scientific computing. Springer, 2001.
    Google ScholarFindings
  • Yifei Ma, Roman Garnett, and Jeff Schneider. Active Area Search via Bayesian Quadrature. In International Conference on Artificial Intelligence and Statistics, 2014.
    Google ScholarLocate open access versionFindings
  • David J. C. MacKay. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003.
    Google ScholarFindings
  • Jean-Michel Marin, Pierre Pudlo, Christian P. Robert, and Robin J. Ryder. Approximate Bayesian computational methods. Statistics and Computing, 2012.
    Google ScholarLocate open access versionFindings
  • Paul Marjoram, John Molitor, Vincent Plagnol, and Simon Tavare. Markov Chain Monte Carlo without Likelihoods. Proceedings of the National Academy of Sciences of the United States of America, 2003.
    Google ScholarLocate open access versionFindings
  • J.B. Mockus and L.J. Mockus. Bayesian approach to global optimization and application to multiobjective and constrained problems. Journal of Optimization Theory and Applications, 1991.
    Google ScholarLocate open access versionFindings
  • M. Osborne, D. Duvenaud, R. Garnett, C. Rasmussen, S. Roberts, and Z. Ghahramani. Active Learning of Model Evidence Using Bayesian Quadrature. In Neural Information Processing Systems, 2012.
    Google ScholarLocate open access versionFindings
  • David Parkinson, Pia Mukherjee, and Andrew R Liddle. A Bayesian model selection analysis of WMAP3. Physical Review, D73:123523, 2006.
    Google ScholarLocate open access versionFindings
  • Gareth W. Peters, Y. Fan, and Scott A. Sisson. On sequential Monte Carlo, partial rejection control and approximate Bayesian computation. Statistics and Computing, 2012.
    Google ScholarLocate open access versionFindings
  • C.E. Rasmussen and C.K.I. Williams. Gaussian Processes for Machine Learning. Adaptative computation and machine learning series. University Press Group Limited, 2006.
    Google ScholarFindings
  • Christian P. Robert and George Casella. Monte Carlo Statistical Methods (Springer Texts in Statistics). Springer-Verlag New York, Inc., 2005.
    Google ScholarFindings
  • Sambu Seo, Marko Wallat, Thore Graepel, and Klaus Obermayer. Gaussian Process Regression: Active Data Selection and Test Point Rejection. In International Joint Conference on Neural Networks, 2000.
    Google ScholarLocate open access versionFindings
  • Burr Settles. Active Learning Literature Survey. Technical report, University of Wisconsin-Madison, 2010.
    Google ScholarFindings
  • John Skilling. Nested sampling for general Bayesian computation. Bayesian Anal., 2006.
    Google ScholarLocate open access versionFindings
  • Niranjan Srinivas, Andreas Krause, Sham Kakade, and Matthias Seeger. Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design. In International Conference on Machine Learning, 2010.
    Google ScholarLocate open access versionFindings
  • M. Tegmark et al. Cosmological Constraints from the SDSS Luminous Red Galaxies. Physical Review, December 2006.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Best Paper
Best Paper of IJCAI, 2015
Tags
Comments