# Bayesian active learning for posterior estimation

IJCAI'15 Proceedings of the 24th International Conference on Artificial Intelligence, pp. 3605-3611, 2015.

EI

Keywords:

Weibo:

Abstract:

This paper studies active posterior estimation in a Bayesian setting when the likelihood is expensive to evaluate. Existing techniques for posterior estimation are based on generating samples representative of the posterior. Such methods do not consider efficiency in terms of likelihood evaluations. In order to be query efficient we treat...More

Code:

Data:

Introduction

- Computing the posterior distribution of parameters given observations is a central problem in statistics.
- The authors only have access to a black box which computes the likelihood for a given value of the parameters.
- Physicists have developed simulation-based probability models of the Universe which can be used to compute the likelihood of cosmological parameters for a given observation.
- Expensive simulators in molecular mechanics, computational biology and neuroscience are used to model many scientific processes

Highlights

- Computing the posterior distribution of parameters given observations is a central problem in statistics
- Our implementation uses Gaussian processes (GP) [Rasmussen and Williams, 2006] and we demonstrate the efficacy of the methods on multiple synthetic and real experiments
- Where P At−1∪{(θ+,L(θ+))} is our estimate of the posterior using At−1 ∪ {(θ+, L(θ+))}. This objective is not accessible in practice, since we know neither Pθ|Xobs nor L(θ+). As surrogates to this ideal objective in Equation (3), in the following subsections we propose two utility functions for determining the point: Negative Expected Divergence (NED) and Exponentiated Variance (EV)
- In our experiments we found that both Exponentiated Variance and Negative Expected Divergence performed well
- We proposed a framework for query efficient posterior estimation for expensive blackbox likelihood evaluations
- Our work demonstrates that when likelihood evaluations are expensive, such

Methods

- The authors first look at a series of low and high dimensional synthetic and real astrophysical experiments.
- NED is only tested on low dimensional problems since empirical approximation and numerical integration is computationally expensive in high dimensions.
- The bandwidth for the kernel was set to be 5n−1/d where n is the total number of queries and d is the dimension.
- This was following several kernel methods (such as kernel

Conclusion

- The authors proposed a framework for query efficient posterior estimation for expensive blackbox likelihood evaluations.
- The authors demonstrate that the methods outperform natural alternatives in practice.
- Note that in Machine Learning it is uncommon to treat posterior estimation in a regression setting.
- This is probably since the estimate will depend on the intricacies of the regression algorithm.
- The authors' work demonstrates that when likelihood evaluations are expensive, such

Summary

## Introduction:

Computing the posterior distribution of parameters given observations is a central problem in statistics.- The authors only have access to a black box which computes the likelihood for a given value of the parameters.
- Physicists have developed simulation-based probability models of the Universe which can be used to compute the likelihood of cosmological parameters for a given observation.
- Expensive simulators in molecular mechanics, computational biology and neuroscience are used to model many scientific processes
## Methods:

The authors first look at a series of low and high dimensional synthetic and real astrophysical experiments.- NED is only tested on low dimensional problems since empirical approximation and numerical integration is computationally expensive in high dimensions.
- The bandwidth for the kernel was set to be 5n−1/d where n is the total number of queries and d is the dimension.
- This was following several kernel methods (such as kernel
## Conclusion:

The authors proposed a framework for query efficient posterior estimation for expensive blackbox likelihood evaluations.- The authors demonstrate that the methods outperform natural alternatives in practice.
- Note that in Machine Learning it is uncommon to treat posterior estimation in a regression setting.
- This is probably since the estimate will depend on the intricacies of the regression algorithm.
- The authors' work demonstrates that when likelihood evaluations are expensive, such

Related work

- Practitioners have conventionally used sampling schemes [MacKay, 2003] to approximate the posterior distributions. Rejection sampling and various MCMC methods are common choices. The advantage of MCMC approaches is their theoretical guarantees with large sample sets [Robert and Casella, 2005] and thus they are a good choice when likelihood evaluations are cheap. However, none of them is intended to be query efficient when evaluations are expensive. Some methods spend most of their computation evaluating point likelihoods and then discard the likelihood values after doing an acceptance test. This gives insight into the potential gains possible by retaining those likelihoods for use in regression. Despite such deficiencies, MCMC remains one of the most popular techniques for posterior estimation in experimental science [Foreman-Mackey et al, 2013; Parkinson et al, 2006; Landau and Binder, 2005; Liu, 2001].

Funding

- This research was partly funded by DOE grant DESC0011114

Reference

- Ryan Prescott Adams, Iain Murray, and David J. C. MacKay. The Gaussian Process Density Sampler. In NIPS, 2008.
- Eric Brochu, Vlad M. Cora, and Nando de Freitas. A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning. CoRR, 2010.
- Brent Bryan, Jeff Schneider, Robert Nichol, Christopher Miller, Christopher Genovese, and Larry Wasserman. Active learning for identifying function threshold boundaries. In Advances in Neural Information Processing Systems. MIT Press, Cambridge, MA, 2006.
- T. M. Davis et al. Scrutinizing Exotic Cosmological Models Using ESSENCE Supernova Data Combined with Other Cosmological Probes. The Astrophysical Journal, pages 716–725, 2007.
- Daniel Foreman-Mackey, David W. Hogg, Dustin Lang, and Jonathan Goodman. emcee: The MCMC Hammer, January 2013.
- Kenji Fukumizu, Le Song, and Arthur Gretton. Kernel Bayes’ Rule: Bayesian Inference with Positive Definite Kernels. Journal of Machine Learning Research, 2014.
- Daniel Golovin and Andreas Krause. Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization. Journal of Artificial Intelligence Research (JAIR), 2011.
- Alkis Gotovos, Nathalie Casati, Gregory Hitz, and Andreas Krause. Active Learning for Level Set Estimation. In IJCAI 2013, Proceedings of the 23rd International Joint Conference on Artificial Intelligence, Beijing, China, August 3-9, 2013, 2013.
- Tom Gunter, Michael A. Osborne, Roman Garnett, Philipp Hennig, and Stephen J. Roberts. Sampling for Inference in Probabilistic Models with Fast Bayesian Quadrature. In Advances in Neural Information Processing Systems, 2014.
- Laszlo Gyorfi, Micael Kohler, Adam Krzyzak, and Harro Walk. A Distribution Free Theory of Nonparametric Regression. Springer Series in Statistics, 2002.
- Kirthevasan Kandasamy, Jeff Schneider, and Barnabas Poczos. High Dimensional Bayesian Optimisation and Bandits via Additive Models. In International Conference on Machine Learning, 2015.
- Andreas Krause, Ajit Singh, and Carlos Guestrin. NearOptimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies. J. Mach. Learn. Res., 2008.
- David Landau and Kurt Binder. A Guide to Monte Carlo Simulations in Statistical Physics. Cambridge University Press, 2005.
- Jun S. Liu. Monte Carlo strategies in Scientific computing. Springer, 2001.
- Yifei Ma, Roman Garnett, and Jeff Schneider. Active Area Search via Bayesian Quadrature. In International Conference on Artificial Intelligence and Statistics, 2014.
- David J. C. MacKay. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003.
- Jean-Michel Marin, Pierre Pudlo, Christian P. Robert, and Robin J. Ryder. Approximate Bayesian computational methods. Statistics and Computing, 2012.
- Paul Marjoram, John Molitor, Vincent Plagnol, and Simon Tavare. Markov Chain Monte Carlo without Likelihoods. Proceedings of the National Academy of Sciences of the United States of America, 2003.
- J.B. Mockus and L.J. Mockus. Bayesian approach to global optimization and application to multiobjective and constrained problems. Journal of Optimization Theory and Applications, 1991.
- M. Osborne, D. Duvenaud, R. Garnett, C. Rasmussen, S. Roberts, and Z. Ghahramani. Active Learning of Model Evidence Using Bayesian Quadrature. In Neural Information Processing Systems, 2012.
- David Parkinson, Pia Mukherjee, and Andrew R Liddle. A Bayesian model selection analysis of WMAP3. Physical Review, D73:123523, 2006.
- Gareth W. Peters, Y. Fan, and Scott A. Sisson. On sequential Monte Carlo, partial rejection control and approximate Bayesian computation. Statistics and Computing, 2012.
- C.E. Rasmussen and C.K.I. Williams. Gaussian Processes for Machine Learning. Adaptative computation and machine learning series. University Press Group Limited, 2006.
- Christian P. Robert and George Casella. Monte Carlo Statistical Methods (Springer Texts in Statistics). Springer-Verlag New York, Inc., 2005.
- Sambu Seo, Marko Wallat, Thore Graepel, and Klaus Obermayer. Gaussian Process Regression: Active Data Selection and Test Point Rejection. In International Joint Conference on Neural Networks, 2000.
- Burr Settles. Active Learning Literature Survey. Technical report, University of Wisconsin-Madison, 2010.
- John Skilling. Nested sampling for general Bayesian computation. Bayesian Anal., 2006.
- Niranjan Srinivas, Andreas Krause, Sham Kakade, and Matthias Seeger. Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design. In International Conference on Machine Learning, 2010.
- M. Tegmark et al. Cosmological Constraints from the SDSS Luminous Red Galaxies. Physical Review, December 2006.

Best Paper

Best Paper of IJCAI, 2015

Tags

Comments