# Taking the Human Out of the Loop: A Review of Bayesian Optimization

Proceedings of the IEEE, Volume 104, Issue 1, 2015, Pages 148-175.

EI WOS

Keywords:

Weibo:

Abstract:

Big Data applications are typically associated with systems involving large numbers of users, massive complex software systems, and large-scale heterogeneous computing and storage architectures. The construction of such systems involves many distributed design choices. The end products (e.g., recommendation systems, medical analysis tools...More

Code:

Data:

Introduction

- Involve many sensor networks to monitor ecological systems, and tunable configuration parameters
- These parameters are developers design software to drive computers and often specified and hard-coded into the software by various electronic devices.
- This review paper introduces Bayesian CPLEX1 for scheduling and planning.
- This solver has 76 free optimization, highlights some of its methodological aspects, parameters, which the designers must tune manuallyVan and showcases a wide range of applications

Highlights

- involve many sensor networks to monitor ecological systems, and tunable configuration parameters
- We have introduced Bayesian optimization from a modeling perspective
- Beginning with the beta-Bernoulli and linear models, and extending them to nonparametric models, we recover a wide range of approaches to Bayesian optimization that have been introduced in the literature
- In addition to outlining different modeling choices, we have considered many of the design decisions that are used to build Bayesian optimization systems
- We further highlighted relevant theory as well as practical considerations that are used when applying these techniques to real-world problems
- We provided a history of Bayesian optimization and related fields and surveyed some of the many successful applications of these methods

Conclusion

- The authors have introduced Bayesian optimization from a modeling perspective.
- There has been a great deal of work that has focused heavily on designing acquisition functions; the authors have taken the perspective that the importance of this plays a secondary role to the choice of the underlying surrogate model.
- In addition to outlining different modeling choices, the authors have considered many of the design decisions that are used to build Bayesian optimization systems.
- The authors discussed extensions of the basic framework to new problem domains, which often require new kinds of surrogate models

Summary

## Introduction:

Involve many sensor networks to monitor ecological systems, and tunable configuration parameters- These parameters are developers design software to drive computers and often specified and hard-coded into the software by various electronic devices.
- This review paper introduces Bayesian CPLEX1 for scheduling and planning.
- This solver has 76 free optimization, highlights some of its methodological aspects, parameters, which the designers must tune manuallyVan and showcases a wide range of applications
## Conclusion:

The authors have introduced Bayesian optimization from a modeling perspective.- There has been a great deal of work that has focused heavily on designing acquisition functions; the authors have taken the perspective that the importance of this plays a secondary role to the choice of the underlying surrogate model.
- In addition to outlining different modeling choices, the authors have considered many of the design decisions that are used to build Bayesian optimization systems.
- The authors discussed extensions of the basic framework to new problem domains, which often require new kinds of surrogate models

- Table1: List of Several Popular Open Source Software Libraries for Bayesian Optimization as of May 2015 resulting in a latent GP with t-distributed predictions and an input-dependent noise covariance

Reference

- R. P. Adams and O. Stegle, ‘‘Gaussian process product models for nonparametric nonstationarity,’’ in Proc. Int. Conf. Mach. Learn., 2008, pp. 1–8.
- S. Agrawal and N. Goyal, ‘‘Thompson sampling for contextual bandits with linear payoffs,’’ in Proc. Int. Conf. Mach. Learn., 2013, pp. 127–135.
- E. B. Anderes and M. L. Stein, ‘‘Estimating deformations of isotropic Gaussian random fields on the plane,’’ Ann. Stat., vol. 36, no. 2, pp. 719–741, 2008.
- C. Andrieu, N. de Freitas, A. Doucet, and M. I. Jordan, ‘‘An introduction to MCMC for machine learning,’’ Mach. Learn., vol. 50, no. 1–2, pp. 5–43, 2003.
- J.-A. M. Assael, Z. Wang, and N. de Freitas, ‘‘Heteroscedastic treed Bayesian optimisation,’’ 2014. [Online]. Available: http://abs/abs/1410.7172.
- C. Audet, J. J. Dennis, D. W. Moore, A. Booker, and P. D. Frank, ‘‘Surrogatemodel-based method for constrained optimization,’’ in Proc. AIAA/USAF/NASA/ ISSMO Symp. Multidisciplinary Anal. Optim., 2000, DOI: 10.2514/6.2000-4891.
- J. Audibert, S. Bubeck, and R. Munos, ‘‘Best arm identification in multi-armed bandits,’’ in Proc. Conf. Learn. Theory, 2010, pp. 41–53.
- P. Auer, ‘‘Using confidence bounds for exploitation-exploration trade-offs,’’ J. Mach. Learn. Res., vol. 3, pp. 397–422, 2003.
- P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire, ‘‘Gambling in a rigged casino: The adversarial multi-armed bandit problem,’’ in Proc. Symp. Found. Comput. Sci., 1995, pp. 322–331.
- P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire, ‘‘The nonstochastic multiarmed bandit problem,’’ SIAM J. Comput., vol. 32, no. 1, pp. 48–77, 2002.
- J. Azimi, A. Jalali, and X. Fern, ‘‘Hybrid batch Bayesian optimization,’’ in Proc. Int. Conf. Mach. Learn., 2012, pp. 1215–1222.
- R. Bardenet, M. Brendel, B. Kegl, and M. Sebag, ‘‘Collaborative hyperparameter tuning,’’ in Proc. Int. Conf. Mach. Learn., 2013, pp. 199–207.
- R. Bardenet and B. Kegl, ‘‘Surrogating the surrogate: Accelerating Gaussian-processbased global optimization with a mixture cross-entropy algorithm,’’ in Proc. Int. Conf. Mach. Learn., 2010, pp. 55–62.
- T. Bartz-Beielstein, C. Lasarczyk, and M. Preuss, ‘‘Sequential parameter optimization,’’ in Proc. IEEE Congr. Evol. Comput., 2005, pp. 773–780.
- R. Benassi, J. Bect, and E. Vazquez, ‘‘Robust Gaussian process-based global optimization using a fully Bayesian expected improvement criterion,’’ in Learning and Intelligent Optimization, vol. 6683, C. Coello, Ed. Berlin, Germany: Springer-Verlag, 2011, pp. 176–190.
- J. Bergstra, R. Bardenet, Y. Bengio, and B. Kegl, ‘‘Algorithms for hyper-parameter optimization,’’ in Proc. Adv. Neural Inf. Process. Syst., 2011, pp. 2546–2554.
- J. Bergstra and Y. Bengio, ‘‘Random search for hyper-parameter optimization,’’ J. Mach. Learn. Res., vol. 13, pp. 281–305, 2012.
- J. Bergstra, B. Komer, C. Eliasmith, and D. Warde-Farley, ‘‘Preliminary evaluation of hyperopt algorithms on HPOLib,’’ in Proc. Int. Conf. Mach. Learn. AutoML Workshop, 2014.
- J. Bergstra, D. Yamins, and D. D. Cox, ‘‘Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures,’’ in Proc. Int. Conf. Mach. Learn., 2013, pp. 115–123.
- S. Bochner, Lectures on Fourier Integrals. Princeton, NJ, USA: Princeton Univ. Press, 1959.
- E. V. Bonilla, K. M. A. Chai, and C. K. I. Williams, ‘‘Multi-task Gaussian process prediction,’’ in Proc. Adv. Neural Inf. Process. Syst., 2008, pp. 153–160.
- L. Bornn, G. Shaddick, and J. V. Zidek, ‘‘Modeling nonstationary processes through dimension expansion,’’ J. Amer. Stat. Soc., vol. 107, no. 497, 2012, pp. 281–289.
- P. Boyle, ‘‘Gaussian processes for regression and optimisation,’’ Ph.D. dissertation, Victoria Univ. Wellington, Wellington, New Zealand, 2007.
- L. Breiman, ‘‘Random forests,’’ Mach. Learn., vol. 45, no. 1, pp. 5–32, 2001.
- L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees. New York, NY, USA: Wadsworth and Brooks, 1984.
- E. Brochu, T. Brochu, and N. de Freitas, ‘‘A Bayesian interactive optimization approach to procedural animation design,’’ in Proc. ACM SIGGRAPH/Eurograph. Symp. Comput. Animat., 2010, pp. 103–112.
- E. Brochu, V. M. Cora, and N. de Freitas, ‘‘A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning,’’ Dept. Comput. Sci., Univ. British Columbia, Vancouver, BC, Canada, Tech. Rep. UBC TR-2009-23, 2009.
- E. Brochu, N. de Freitas, and A. Ghosh, ‘‘Active preference learning with discrete choice data,’’ in Proc. Adv. Neural Inf. Process. Syst., 2007, pp. 409–416.
- S. Bubeck and N. Cesa-Bianchi, ‘‘Regret analysis of stochastic and nonstochastic multi-armed bandit problems,’’ Found. Trends Mach. Learn., vol. 5, no. 1, pp. 1–122, 2012.
- S. Bubeck, R. Munos, and G. Stoltz, ‘‘Pure exploration in multi-armed bandits problems,’’ in Proc. Int. Conf. Algorithmic Learn. Theory, 2009, pp. 23–37.
- S. Bubeck, R. Munos, G. Stoltz, and C. Szepesvari, ‘‘X-armed bandits,’’ J. Mach. Learn. Res., vol. 12, pp. 1655–1695, 2011.
- A. D. Bull, ‘‘Convergence rates of efficient global optimization algorithms,’’ J. Mach. Learn. Res., vol. 12, pp. 2879–2904, 2011.
- D. Busby, ‘‘Hierarchical adaptive experimental design for Gaussian process emulators,’’ Reliab. Eng. Syst. Safety, vol. 94, no. 7, pp. 1183–1193, Jul. 2009.
- R. Calandra, J. Peters, C. E. Rasmussen, and M. P. Deisenroth, Manifold Gaussian processes for regression, 2014. [Online]. Available: arXiv:1402.5876.
- A. Carpentier and R. Munos, ‘‘Bandit theory meets compressed sensing for high dimensional stochastic linear bandit,’’ in Proc. 15th Int. Conf. Artif. Intell. Stat., 2012, pp. 190–198.
- N. Cesa-Bianchi and G. Lugosi, Prediction, Learning, Games. New York, NY, USA: Cambridge Univ. Press, 2006.
- K. Chaloner and I. Verdinelli, ‘‘Bayesian experimental design: A review,’’ Stat. Sci., vol. 10, no. 3, pp. 273–304, 1995.
- O. Chapelle and L. Li, ‘‘An empirical evaluation of Thompson sampling,’’ in Proc. Adv. Neural Inf. Process. Syst., 2011, pp. 2249–2257.
- B. Chen, R. Castro, and A. Krause, ‘‘Joint optimization and variable selection of highdimensional Gaussian processes,’’ in Proc. Int. Conf. Mach. Learn., 2012, pp. 1423–1430.
- S. Clark, ‘‘Parallel machine learning algorithms in bioinformatics and global optimization,’’ Ph.D. dissertation, Cornell Univ., Ithaca, NY, USA, 2012.
- E. Contal, V. Perchet, and N. Vayatis, ‘‘Gaussian process optimization with mutual information,’’ in Proc. Int. Conf. Mach. Learn., 2013, pp. 253–261.
- A. Criminisi, J. Shotton, and E. Konukoglu, ‘‘Decision forests: A unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning,’’ Found. Trends Comput. Graph. Vis., vol. 7, pp. 81–227, 2011.
- N. de Freitas, A. Smola, and M. Zoghi, ‘‘Exponential regret bounds for Gaussian process bandits with deterministic observations,’’ in Proc. Int. Conf. Mach. Learn., 2012, pp. 1743–1750.
- M. Denil, L. Bazzani, H. Larochelle, and N. de Freitas, ‘‘Learning where to attend with deep architectures for image tracking,’’ Neural Comput., vol. 24, no. 8, pp. 2151–2184, 2012.
- T. Desautels, A. Krause, and J. Burdick, ‘‘Parallelizing exploration-exploitation tradeoffs with Gaussian process bandit optimization,’’ J. Mach. Learn. Res., vol. 15, pp. 4053–4103, 2014.
- J. Djolonga, A. Krause, and V. Cevher, ‘‘High dimensional Gaussian process bandits,’’ in Proc. Adv. Neural Inf. Process. Syst., 2013, pp. 1025–1033.
- T. Domhan, J. T. Springenberg, and F. Hutter, ‘‘Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves,’’ in Proc. 24th Int. Joint Conf. Artif. Intell., Jul. 2015, pp. 3460–3468.
- T. Domhan, T. Springenberg, and F. Hutter, ‘‘Extrapolating learning curves of deep neural networks,’’ in Proc. Int. Conf. Mach. Learn. AutoML Workshop, 2014.
- M. Feurer, T. Springenberg, and F. Hutter, ‘‘Initializing Bayesian hyperparameter optimization via meta-learning,’’ in Proc. Nat. Conf. Artif. Intell., 2015, pp. 1128–1135.
- V. Gabillon, M. Ghavamzadeh, and A. Lazaric, ‘‘Best arm identification: A unified approach to fixed budget and fixed confidence,’’ in Proc. Adv. Neural Inf. Process. Syst., 2012, pp. 3212–3220.
- V. Gabillon, M. Ghavamzadeh, A. Lazaric, and S. Bubeck, ‘‘Multi-bandit best arm identification,’’ in Proc. Adv. Neural Inf. Process. Syst., 2011, pp. 2222–2230.
- J. R. Gardner, M. J. Kusner, Z. Xu, K. Q. Weinberger, and J. P. Cunningham, ‘‘Bayesian optimization with inequality constraints,’’ in Proc. Int. Conf. Mach. Learn., 2014, pp. 937–945.
- R. Garnett, M. A. Osborne, and P. Hennig, ‘‘Active learning of linear embeddings for Gaussian processes,’’ in Proc. Conf. Uncertainty Artif. Intell., 2014, pp. 24–33.
- R. Garnett, M. A. Osborne, S. Reece, A. Rogers, and S. J. Roberts, ‘‘Sequential Bayesian prediction in the presence of changepoints and faults,’’ Comput. J., vol. 53, no. 9, pp. 1430–1446, 2010.
- R. Garnett, M. A. Osborne, and S. J. Roberts, ‘‘Bayesian optimization for sensor set selection,’’ in Proc. ACM/IEEE Int. Conf. Inf. Process. Sensor Netw., 2010, pp. 209–219.
- M. A. Gelbart, J. Snoek, and R. P. Adams, ‘‘Bayesian optimization with unknown constraints,’’ in Proc. Conf. Uncertainty Artif. Intell., 2014, pp. 250–259.
- D. Ginsbourger, R. Le Riche, and L. Carraro, ‘‘Kriging is well-suited to parallelize optimization,’’ in Proc. Comput. Intell. Expensive Optim. Problems, 2010, pp. 131–162.
- D. Ginsbourger and R. L. Riche, ‘‘Dealing with asynchronicity in parallel Gaussian process based global optimization,’’ 2010. [Online]. Available: http://hal.archivesouvertes.fr/hal-00507632.
- J. C. Gittins, ‘‘Bandit processes and dynamic allocation indices J. Roy. Stat. Soc. B, Methodol., vol. 2, pp. 148–177, 1979.
- P. Goovaerts, Geostatistics for Natural Resources Evaluation. Oxford, U.K.: Oxford Univ. Press, 1997.
- R. B. Gramacy et al., ‘‘Modeling an augmented Lagrangian for improved blackbox constrained optimization,’’ 2014. [Online]. Available: arXiv:1403.4890.
- R. B. Gramacy and H. K. Lee, ‘‘Optimization under unknown constraints,’’ 2010. [Online]. Available: arXiv:1004.4027.
- R. B. Gramacy, H. K. H. Lee, and W. G. Macready, ‘‘Parameter space exploration with Gaussian process trees,’’ in Proc. Int. Conf. Mach. Learn., 2004, pp. 45–52.
- S. Grunewalder, J. Audibert, M. Opper, and J. Shawe-Taylor, ‘‘Regret bounds for Gaussian process bandit problems,’’ in Proc. 13th Int. Conf. Artif. Intell. Stat., 2010, pp. 273–280.
- F. Hamze, Z. Wang, and N. de Freitas, ‘‘Self-avoiding random dynamics on integer complex systems,’’ ACM Trans. Model. Comput. Simul., vol. 23, no. 1, p. 9, 2013.
- N. Hansen and A. Ostermeier, ‘‘Completely derandomized self-adaptation in evolution strategies,’’ Evol. Comput, vol. 9, no. 2, pp. 159–195, 2001.
- P. Hennig and C. Schuler, ‘‘Entropy search for information-efficient global optimization,’’ J. Mach. Learn. Res., vol. 13, no. 1, pp. 1809–1837, 2012.
- J. M. Hernandez-Lobato, M. A. Gelbart, M. W. Hoffman, R. P. Adams, and Z. Ghahramani, ‘‘Predictive entropy search for Bayesian optimization with unknown constraints,’’ in Proc. Int. Conf. Mach. Learn., 2015, pp. 1699–1707.
- J. M. Hernandez-Lobato, M. W. Hoffman, and Z. Ghahramani, ‘‘Predictive entropy search for efficient global optimization of black-box functions,’’ in Proc. Adv. Neural Inf. Process. Syst., 2014, pp. 918–926.
- D. Higdon, J. Swall, and J. Kern, ‘‘Non-stationary spatial modeling,’’ Bayesian Stat., vol. 6, 1998, pp. 761–768.
- G. E. Hinton and R. Salakhutdinov, ‘‘Using deep belief nets to learn covariance kernels for Gaussian processes,’’ in Proc. Adv. Neural Inf. Process. Syst., 2008, pp. 1249–1256.
- M. Hoffman, B. Shahriari, and N. de Freitas, ‘‘On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning,’’ in Proc. 17th Int. Conf. Artif. Intell. Stat., 2014, pp. 365–374.
- M. W. Hoffman, E. Brochu, and N. de Freitas, ‘‘Portfolio allocation for Bayesian optimization,’’ in Proc. Conf. Uncertainty Artif. Intell., 2011, pp. 327–336.
- H. H. Hoos, ‘‘Programming by optimization,’’ Commun. ACM, vol. 55, no. 2, pp. 70–80, 2012.
- D. Huang, T. Allen, W. Notz, and N. Zeng, ‘‘Global optimization of stochastic black-box systems via sequential Kriging meta-models,’’ J. Global Optim., vol. 34, no. 3, pp. 441–466, 2006.
- F. Hutter, ‘‘Automated configuration of algorithms for solving hard computational problems,’’ Ph.D. dissertation, Univ. British Columbia, Vancouver, BC, Canada, 2009.
- F. Hutter, H. Hoos, and K. Leyton-Brown, ‘‘Identifying key algorithm parameters and instance features using forward selection,’’ in Learning and Intelligent Optimization, vol. 7997, Berlin, Germany: Springer-Verlag, 2013, pp. 364–381.
- F. Hutter, H. H. Hoos, and K. Leyton-Brown, ‘‘Automated configuration of mixed integer programming solvers,’’ in Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems, Berlin, Germany: Springer-Verlag, 2010, pp. 186–202.
- F. Hutter, H. H. Hoos, and K. Leyton-Brown, ‘‘Sequential model-based optimization for general algorithm configuration,’’ Learning and Intelligent Optimization, Berlin, Germany: Springer-Verlag, 2011, pp. 507–523.
- F. Hutter, H. H. Hoos, and K. Leyton-Brown, ‘‘Parallel algorithm configuration,’’ Learning and Intelligent Optimization, Berlin, Germany: Springer-Verlag, 2012, pp. 55–70.
- D. Jones, ‘‘A taxonomy of global optimization methods based on response surfaces,’’ J. Global Optim., vol. 21, no. 4, pp. 345–383, 2001.
- D. Jones, M. Schonlau, and W. Welch, ‘‘Efficient global optimization of expensive black-box functions,’’ J. Global Optim., vol. 13, no. 4, pp. 455–492, 1998.
- D. R. Jones, C. D. Perttunen, and B. E. Stuckman, ‘‘Lipschitzian optimization without the Lipschitz constant,’’ J. Optim. Theory Appl., vol. 79, no. 1, pp. 157–181, 1993.
- E. Kaufmann, O. Cappe, and A. Garivier, ‘‘On Bayesian upper confidence bounds for bandit problems,’’ in Proc. Int. Conf. Artif. Intell. Stat., 2012, pp. 592–600.
- E. Kaufmann, N. Korda, and R. Munos, ‘‘Thompson sampling: An asymptotically optimal finite-time analysis,’’ in Algorithmic Learning Theory, vol. 7568, Berlin, Germany: Springer-Verlag, 2012, pp. 199–213.
- K. Kersting, C. Plagemann, P. Pfaff, and W. Burgard, ‘‘Most likely heteroscedastic Gaussian process regression,’’ in Proc. Int. Conf. Mach. Learn., 2007, pp. 393–400.
- L. Kocsis and C. Szepesvari, ‘‘Bandit based Monte-Carlo planning,’’ in Proc. Eur. Conf. Mach. Learn., 2006, pp. 282–293.
- R. Kohavi, R. Longbotham, D. Sommerfield, and R. M. Henne, ‘‘Controlled experiments on the web: Survey and practical guide,’’ Data Mining Knowl. Disc., vol. 18, no. 1, pp. 140–181, 2009.
- A. Krause and C. S. Ong, ‘‘Contextual Gaussian process bandit optimization,’’ in Proc. Adv. Neural Inf. Process. Syst., 2011, pp. 2447–2455.
- D. G. Krige, ‘‘A statistical approach to some basic mine valuation problems on the witwatersrand,’’ in J. Chem. Metallurgical Mining Soc. South Africa, vol. 94, no. 3, 1951, pp. 95–111.
- H. J. Kushner, ‘‘A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise,’’ J. Fluids Eng., vol. 86, no. 1, pp. 97–106, 1964.
- T. L. Lai and H. Robbins, ‘‘Asymptotically efficient adaptive allocation rules,’’ Adv. Appl. Math., vol. 6, no. 1, pp. 4–22, 1985.
- M. Lazaro-Gredilla and A. R. Figueiras-Vidal, ‘‘Marginalized neural network mixtures for large-scale regression,’’ IEEE Trans. Neural Netw., vol. 21, no. 8, pp. 1345–1351, Aug. 2010.
- M. Lazaro-Gredilla, J. Quinnonero-Candela, C. E. Rasmussen, and A. R. Figueiras-Vidal, ‘‘Sparse spectrum Gaussian process regression,’’ J. Mach. Learn. Res., vol. 11, pp. 1865–1881, 2010.
- Q. V. Le, A. J. Smola, and S. Canu, ‘‘Heteroscedastic Gaussian process regression,’’ in Proc. Int. Conf. Mach. Learn., 2005, pp. 489–496.
- K. Leyton-Brown, E. Nudelman, and Y. Shoham, ‘‘Learning the empirical hardness of optimization problems: The case of combinatorial auctions,’’ in Principles and Practice of Constraint Programming, ser. Lecture Notes in Computer Science, Berlin, Germany: Springer-Verlag, 2002, pp. 556–572.
- L. Li, W. Chu, J. Langford, and R. E. Schapire, ‘‘A contextual-bandit approach to personalized news article recommendation,’’ in Proc. World Wide Web, 2010, pp. 661–670.
- D. V. Lindley, ‘‘On a measure of the information provided by an experiment,’’ Ann. Math. Stat., vol. 27, no. 4, pp. 986–1005, 1956.
- D. Lizotte, ‘‘Practical Bayesian optimization,’’ Ph.D. dissertation, Univ. Alberta, Edmonton, AB, Canada, 2008.
- D. Lizotte, R. Greiner, and D. Schuurmans, ‘‘An experimental methodology for response surface optimization methods,’’ J. Global Optim., vol. 53, pp. 1–38, 2011.
- D. Lizotte, T. Wang, M. Bowling, and D. Schuurmans, ‘‘Automatic gait optimization with Gaussian process regression,’’ in Proc. Int. Joint Conf. Artif. Intell., 2007, pp. 944–949.
- M. Locatelli, ‘‘Bayesian algorithms for one-dimensional global optimization,’’ J. Global Optim., vol. 10, pp. 57–76, 1997.
- M. Lzaro-gredilla and M. K. Titsias, ‘‘Variational heteroscedastic Gaussian process regression,’’ in Proc. Int. Conf. Mach. Learn., 2011, pp. 841–848, ACM.
- O. Madani, D. Lizotte, and R. Greiner, ‘‘Active model selection,’’ in Proc. Conf. Uncertainty Artif. Intell., 2004, pp. 357–365.
- N. Mahendran, Z. Wang, F. Hamze, and N. de Freitas, ‘‘Adaptive MCMC with Bayesian optimization,’’ J. Mach. Learn. Res., vol. 22, pp. 751–760, 2012.
- R. Marchant and F. Ramos, ‘‘Bayesian optimisation for intelligent environmental monitoring,’’ in NIPS Workshop Bayesian Optim. Decision Making, 2012.
- O. Maron and A. W. Moore, ‘‘Hoeffding races: Accelerating model selection search for classification and function approximation,’’ Robot. Inst., vol. 6, pp. 59–66, 1993.
- R. Martinez-Cantin, N. de Freitas, E. Brochu, J. Castellanos, and A. Doucet, ‘‘A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot,’’ Autonom. Robots, vol. 27, no. 2, pp. 93–103, 2009.
- J. Martinez, J. J. Little, and N. de Freitas, ‘‘Bayesian optimization with an empirical hardness model for approximate nearest neighbour search,’’ in Proc. IEEE Winter Conf. Appl. Comput. Vis., 2014, pp. 588–595.
- R. Martinez-Cantin, N. de Freitas, A. Doucet, and J. A. Castellanos, ‘‘Active policy learning for robot planning and exploration under uncertainty,’’ in Proc. Robot. Sci. Syst., pp. 321–328, 2007.
- G. Matheron, ‘‘The theory of regionalized variables and its applications,’’ in Cahier du Centre de Morphologie Mathematique, Ecoles des Mines, 1971.
- B. C. May, N. Korda, A. Lee, and D. S. Leslie, ‘‘Optimistic Bayesian sampling in contextual bandit problems,’’ Stat. Group, Schl. Math., Univ. Bristol, Bristol, U.K., Tech. Rep. 11:01, 2011.
- V. Mnih, C. Szepesvari, and J.-Y. Audibert, ‘‘Empirical Bernstein stopping,’’ in Proc. Int. Conf. Mach. Learn., 2008, pp. 672–679.
- J. Mockus, ‘‘Application of Bayesian approach to numerical methods of global and stochastic optimization,’’ J. Global Optim., vol. 4, no. 4, pp. 347–365, 1994.
- J. Mockus, V. Tiesis, and A. Zilinskas, ‘‘The application of Bayesian methods for seeking the extremum,’’ in Toward Global Optimization, vol. 2, L. Dixon and G. Szego, Eds. Amsterdam, The Netherlands: Elsevier, 1978.
- R. Munos, ‘‘Optimistic optimization of a deterministic function without the knowledge of its smoothness,’’ in Proc. Adv. Neural Inf. Process. Syst., 2011, pp. 783–791.
- R. Munos, ‘‘From bandits to Monte-Carlo tree search: The optimistic principle applied to optimization and planning,’’ INRIA Lille, France, Tech. Rep. hal-00747575, 2014.
- R. M. Neal, ‘‘Bayesian learning for neural networks,’’ Ph.D. dissertation, Univ. Toronto, Toronto, ON, Canada, 1995.
- J. Nelder and R. Wedderburn, ‘‘Generalized linear models,’’ J. Roy. Stat. Soc. A, vol. 135, no. 3, pp. 370–384, 1972.
- M. A. Osborne, R. Garnett, and S. J. Roberts, ‘‘Gaussian processes for global optimisation Learning and Intelligent Optimization, Berlin, Germany: Springer-Verlag, pp. 1–15, 2009.
- C. Paciorek and M. Schervish, ‘‘Nonstationary covariance functions for Gaussian process regression,’’ in Proc. Adv. Neural Inf. Process. Syst., 2004, vol. 16, pp. 273–280.
- V. Picheny and D. Ginsbourger, ‘‘A nonstationary space-time Gaussian process model for partially converged simulations,’’ SIAM/ASA J. Uncertainty Quantif., vol. 1, no. 1, pp. 57–78, 2013.
- J. C. Pinheiro and D. M. Bates, ‘‘Unconstrained parametrizations for variance-covariance matrices,’’ Stat. Comput., vol. 6, no. 3, pp. 289–296, 1996.
- J. Qui nonero-Candela and C. E. Rasmussen, ‘‘A unifying view of sparse approximate Gaussian process regression,’’ J. Mach. Learn. Res., vol. 6, pp. 1939–1959, 2005.
- A. Rahimi and B. Recht, ‘‘Random features for large-scale kernel machines,’’ in Proc. Adv. Neural Inf. Process. Syst., 2007, pp. 1177–1184.
- C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning. Cambridge, MA, USA: MIT Press, 2006.
- D. Russo and B. Van Roy, ‘‘Learning to optimize via posterior sampling,’’ Math. Oper. Res., vol. 39, no. 4, pp. 1221–1243, 2014.
- J. Sacks, W. J. Welch, T. J. Welch, and H. P. Wynn, ‘‘Design and analysis of computer experiments,’’ Stat. Sci., vol. 4, no. 4, pp. 409–423, 1989.
- P. D. Sampson and P. Guttorp, ‘‘Nonparametric estimation of nonstationary spatial covariance structure,’’ J. Amer. Stat. Assoc., vol. 87, no. 417, pp. 108–119, 1992.
- T. J. Santner, B. Williams, and W. Notz, The Design and Analysis of Computer Experiments. New York, NY, USA: Springer-Verlag, 2003.
- M. J. Sasena, ‘‘Flexibility and efficiency enhancement for constrained global design optimization with Kriging approximations,’’ Ph.D. dissertation, Univ. Michigan, Ann Arbor, MI, USA, 2002.
- A. M. Schmidt and A. O’Hagan, ‘‘Bayesian inference for nonstationary spatial covariance structures via spatial deformations,’’ J. Roy. Stat. Soc. B, vol. 65, pp. 743–758, 2003.
- M. Schonlau, ‘‘Computer experiments and global optimization,’’ Ph.D. dissertation, Univ. Waterloo, Waterloo, ON, Canada, 1997.
- M. Schonlau, W. J. Welch, and D. R. Jones, ‘‘Global versus local search in constrained optimization of computer models,’’ Lecture Notes-Monograph Series, vol. 34, pp. 11–25, 1998.
- S. L. Scott, ‘‘A modern Bayesian look at the multi-armed bandit,’’ Appl. Stochastic Models Business Ind., vol. 26, no. 6, pp. 639–658, 2010.
- M. Seeger, Y.-W. Teh, and M. I. Jordan, ‘‘Semiparametric latent factor models,’’ in Proc. Int. Conf. Artif. Intell. Stat., 2005, pp. 333–340.
- M. Seeger, C. Williams, and N. Lawrence, ‘‘Fast forward selection to speed up sparse Gaussian process regression,’’ in Proc. Artif. Intell. Stat. 9, 2003, pp. 1–8.
- A. Shah, A. G. Wilson, and Z. Ghahramani, ‘‘Student-t processes as alternatives to Gaussian processes,’’ in Proc. Int. Conf. Artif. Intell. Stat., 2014, pp. 877–885.
- B. Shahriari, Z. Wang, M. W. Hoffman, A. Bouchard-Cote, and N. de Freitas, ‘‘An entropy search portfolio,’’ in Proc. NIPS Workshop Bayesian Optim., 2014.
- E. Snelson and Z. Ghahramani, ‘‘Sparse Gaussian processes using pseudo-inputs,’’ in Proc. Adv. Neural Inf. Process. Syst., 2005, pp. 1257–1264.
- E. Snelson, C. E. Rasmussen, and Z. Ghahramani, ‘‘Warped Gaussian processes,’’ in Proc. Adv. Neural Inf. Process. Syst., 2003, pp. 337–344.
- J. Snoek, ‘‘Bayesian optimization and semiparametric models with applications to assistive technology,’’ Ph.D. dissertation, Univ. Toronto, Toronto, ON, Canada, 2013.
- J. Snoek, H. Larochelle, and R. P. Adams, ‘‘Practical Bayesian optimization of machine learning algorithms,’’ in Proc. Adv. Neural Inf. Process. Syst., 2012, pp. 2951–2959.
- J. Snoek et al., ‘‘Scalable Bayesian optimization using deep neural networks,’’ in Proc. Int. Conf. Mach. Learn., 2015, pp. 2171–2180.
- J. Snoek, K. Swersky, R. S. Zemel, and R. P. Adams, ‘‘Input warping for Bayesian optimization of non-stationary functions,’’ in Proc. Int. Conf. Mach. Learn., 2014, pp. 1674–1682.
- N. Srinivas, A. Krause, S. M. Kakade, and M. Seeger, ‘‘Gaussian process optimization in the bandit setting: No regret and experimental design,’’ in Proc. Int. Conf. Mach. Learn., 2010, pp. 1015–1022.
- K. Swersky, D. Duvenaud, J. Snoek, F. Hutter, and M. A. Osborne, ‘‘Raiders of the lost architecture: Kernels for Bayesian optimization in conditional parameter spaces,’’ 2014. [Online]. Available: arXiv: 1409.4011.
- K. Swersky, J. Snoek, and R. P. Adams, ‘‘Multi-task Bayesian optimization,’’ in Proc. Adv. Neural Inf. Process. Syst., 2013, pp. 2004–2012.
- K. Swersky, J. Snoek, and R. P. Adams, ‘‘Freeze-thaw Bayesian optimization,’’ 2014. [Online]. Available: arXiv:1406.3896.
- W. R. Thompson, ‘‘On the likelihood that one unknown probability exceeds another in view of the evidence of two samples,’’ Biometrika, vol. 25, no. 3/4, pp. 285–294, 1933.
- C. Thornton, F. Hutter, H. H. Hoos, and K. Leyton-Brown, ‘‘Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms,’’ in Proc. Knowl. Disc. Data Mining, 2013, pp. 847–855.
- M. K. Titsias, ‘‘Variational learning of inducing variables in sparse Gaussian processes,’’ in Proc. Int. Conf. Artif. Intell. Stat., 2009, pp. 567–574.
- H. P. Vanchinathan, I. Nikolic, F. De Bona, and A. Krause, ‘‘Explore-exploit in top-N recommender systems via Gaussian processes,’’ in Proc. 8th ACM Conf. Recommender Syst., 2014, pp. 225–232.
- E. Vazquez and J. Bect, ‘‘Convergence properties of the expected improvement algorithm with fixed mean and covariance functions,’’ J. Stat. Planning Inference, vol. 140, no. 11, pp. 3088–3095, 2010.
- J. Villemonteix, E. Vazquez, and E. Walter, ‘‘An informational approach to the global optimization of expensive-to-evaluate functions,’’ J. Global Optim., vol. 44, no. 4, pp. 509–534, 2009.
- Z. Wang and N. de Freitas, ‘‘Theoretical analysis of Bayesian optimisation with unknown Gaussian process hyper-parameters,’’ 2014. [Online]. Available: arXiv:1406.7758.
- Z. Wang, B. Shakibi, L. Jin, and N. de Freitas, ‘‘Bayesian multi-scale optimistic optimization,’’ in Proc. Int. Conf. Artif. Intell. Stat., 2014, pp. 1005–1014.
- Z. Wang, M. Zoghi, D. Matheson, F. Hutter, and N. de Freitas, ‘‘Bayesian optimization in high dimensions via random embeddings,’’ in Proc. Int. Joint Conf. Artif. Intell., 2013, pp. 1778–1784.
- B. J. Williams, T. J. Santner, and W. I. Notz, ‘‘Sequential design of computer experiments to minimize integrated response functions,’’ Statistica Sinica, vol. 10, pp. 1133–1152, 2000.
- D. Yogatama and G. Mann, ‘‘Efficient transfer learning method for automatic hyperparameter tuning,’’ in Proc. Int. Conf. Artif. Intell. Stat., 2014, pp. 1077–1085.
- D. Yogatama and N. A. Smith, ‘‘Bayesian optimization of text representations,’’ 2015. [Online]. Available: arXiv:1503.00693.
- Y. Zhang, K. Sohn, R. Villegas, G. Pan, and H. Lee, ‘‘Improving object detection with deep convolutional networks via Bayesian optimization and structured prediction,’’ in Proc. IEEE Comput. Vis. Pattern Recognit. Conf., 2015, pp. 249–258.
- A. Zilinskas and J. Zilinskas, ‘‘Global optimization based on a statistical model and simplical partitioning,’’ Comput. Math. Appl., vol. 44, pp. 957–967, 2002. Ryan P. Adams received the Ph.D. degree in physics from the University of Cambridge, Cambridge, U.K., in 2009.
- Nando de Freitas received the Ph.D. degree in Bayesian methods for neural networks from Trinity College, Cambridge University, Cambridge, U.K., in 2000.
- He is a Machine Learning Professor at Oxford University, Oxford, U.K. and a Senior Staff Research Scientist at Google DeepMind, U.K. From 1999 to 2001, he was a Postdoctoral Fellow at the University of California Berkeley, Berkeley, CA, USA, in the artificial intelligence group. He was a Professor at the University of British Columbia, Vancouver, BC, Canada, from 2001 to 2013.
- Prof. de Freitas is a Fellow of the Canadian Institute For Advanced Research (CIFAR) in the successful Neural Computation and Adaptive Perception program. Among his recent awards are the 2012 Charles A. McDowell Award for Excellence in Research and the 2010 Mathematics of Information Technology and Complex Systems (MITACS) Young Researcher Award.

Tags

Comments