Taking the Human Out of the Loop: A Review of Bayesian Optimization

    Shahriari, B.
    Shahriari, B.
    Swersky, K.
    Swersky, K.
    Wang, Z.
    Wang, Z.

    Proceedings of the IEEE, Volume 104, Issue 1, 2015, Pages 148-175.

    Cited by: 673|Bibtex|Views29|Links
    EI WOS
    Keywords:
    Decision makingdecision makingdesign of experimentsgenomic medicineoptimizationMore(2+)
    Wei bo:
    We have introduced Bayesian optimization from a modeling perspective

    Abstract:

    Big Data applications are typically associated with systems involving large numbers of users, massive complex software systems, and large-scale heterogeneous computing and storage architectures. The construction of such systems involves many distributed design choices. The end products (e.g., recommendation systems, medical analysis tools...More

    Code:

    Data:

    0
    Introduction
    • Involve many sensor networks to monitor ecological systems, and tunable configuration parameters
    • These parameters are developers design software to drive computers and often specified and hard-coded into the software by various electronic devices.
    • This review paper introduces Bayesian CPLEX1 for scheduling and planning.
    • This solver has 76 free optimization, highlights some of its methodological aspects, parameters, which the designers must tune manuallyVan and showcases a wide range of applications
    Highlights
    • involve many sensor networks to monitor ecological systems, and tunable configuration parameters
    • We have introduced Bayesian optimization from a modeling perspective
    • Beginning with the beta-Bernoulli and linear models, and extending them to nonparametric models, we recover a wide range of approaches to Bayesian optimization that have been introduced in the literature
    • In addition to outlining different modeling choices, we have considered many of the design decisions that are used to build Bayesian optimization systems
    • We further highlighted relevant theory as well as practical considerations that are used when applying these techniques to real-world problems
    • We provided a history of Bayesian optimization and related fields and surveyed some of the many successful applications of these methods
    Conclusion
    • The authors have introduced Bayesian optimization from a modeling perspective.
    • There has been a great deal of work that has focused heavily on designing acquisition functions; the authors have taken the perspective that the importance of this plays a secondary role to the choice of the underlying surrogate model.
    • In addition to outlining different modeling choices, the authors have considered many of the design decisions that are used to build Bayesian optimization systems.
    • The authors discussed extensions of the basic framework to new problem domains, which often require new kinds of surrogate models
    Summary
    • Introduction:

      Involve many sensor networks to monitor ecological systems, and tunable configuration parameters
    • These parameters are developers design software to drive computers and often specified and hard-coded into the software by various electronic devices.
    • This review paper introduces Bayesian CPLEX1 for scheduling and planning.
    • This solver has 76 free optimization, highlights some of its methodological aspects, parameters, which the designers must tune manuallyVan and showcases a wide range of applications
    • Conclusion:

      The authors have introduced Bayesian optimization from a modeling perspective.
    • There has been a great deal of work that has focused heavily on designing acquisition functions; the authors have taken the perspective that the importance of this plays a secondary role to the choice of the underlying surrogate model.
    • In addition to outlining different modeling choices, the authors have considered many of the design decisions that are used to build Bayesian optimization systems.
    • The authors discussed extensions of the basic framework to new problem domains, which often require new kinds of surrogate models
    Tables
    • Table1: List of Several Popular Open Source Software Libraries for Bayesian Optimization as of May 2015 resulting in a latent GP with t-distributed predictions and an input-dependent noise covariance
    Download tables as Excel
    Reference
    • R. P. Adams and O. Stegle, ‘‘Gaussian process product models for nonparametric nonstationarity,’’ in Proc. Int. Conf. Mach. Learn., 2008, pp. 1–8.
      Google ScholarLocate open access versionFindings
    • S. Agrawal and N. Goyal, ‘‘Thompson sampling for contextual bandits with linear payoffs,’’ in Proc. Int. Conf. Mach. Learn., 2013, pp. 127–135.
      Google ScholarLocate open access versionFindings
    • E. B. Anderes and M. L. Stein, ‘‘Estimating deformations of isotropic Gaussian random fields on the plane,’’ Ann. Stat., vol. 36, no. 2, pp. 719–741, 2008.
      Google ScholarLocate open access versionFindings
    • C. Andrieu, N. de Freitas, A. Doucet, and M. I. Jordan, ‘‘An introduction to MCMC for machine learning,’’ Mach. Learn., vol. 50, no. 1–2, pp. 5–43, 2003.
      Google ScholarLocate open access versionFindings
    • J.-A. M. Assael, Z. Wang, and N. de Freitas, ‘‘Heteroscedastic treed Bayesian optimisation,’’ 2014. [Online]. Available: http://abs/abs/1410.7172.
      Findings
    • C. Audet, J. J. Dennis, D. W. Moore, A. Booker, and P. D. Frank, ‘‘Surrogatemodel-based method for constrained optimization,’’ in Proc. AIAA/USAF/NASA/ ISSMO Symp. Multidisciplinary Anal. Optim., 2000, DOI: 10.2514/6.2000-4891.
      Locate open access versionFindings
    • J. Audibert, S. Bubeck, and R. Munos, ‘‘Best arm identification in multi-armed bandits,’’ in Proc. Conf. Learn. Theory, 2010, pp. 41–53.
      Google ScholarLocate open access versionFindings
    • P. Auer, ‘‘Using confidence bounds for exploitation-exploration trade-offs,’’ J. Mach. Learn. Res., vol. 3, pp. 397–422, 2003.
      Google ScholarLocate open access versionFindings
    • P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire, ‘‘Gambling in a rigged casino: The adversarial multi-armed bandit problem,’’ in Proc. Symp. Found. Comput. Sci., 1995, pp. 322–331.
      Google ScholarLocate open access versionFindings
    • P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire, ‘‘The nonstochastic multiarmed bandit problem,’’ SIAM J. Comput., vol. 32, no. 1, pp. 48–77, 2002.
      Google ScholarLocate open access versionFindings
    • J. Azimi, A. Jalali, and X. Fern, ‘‘Hybrid batch Bayesian optimization,’’ in Proc. Int. Conf. Mach. Learn., 2012, pp. 1215–1222.
      Google ScholarLocate open access versionFindings
    • R. Bardenet, M. Brendel, B. Kegl, and M. Sebag, ‘‘Collaborative hyperparameter tuning,’’ in Proc. Int. Conf. Mach. Learn., 2013, pp. 199–207.
      Google ScholarLocate open access versionFindings
    • R. Bardenet and B. Kegl, ‘‘Surrogating the surrogate: Accelerating Gaussian-processbased global optimization with a mixture cross-entropy algorithm,’’ in Proc. Int. Conf. Mach. Learn., 2010, pp. 55–62.
      Google ScholarLocate open access versionFindings
    • T. Bartz-Beielstein, C. Lasarczyk, and M. Preuss, ‘‘Sequential parameter optimization,’’ in Proc. IEEE Congr. Evol. Comput., 2005, pp. 773–780.
      Google ScholarLocate open access versionFindings
    • R. Benassi, J. Bect, and E. Vazquez, ‘‘Robust Gaussian process-based global optimization using a fully Bayesian expected improvement criterion,’’ in Learning and Intelligent Optimization, vol. 6683, C. Coello, Ed. Berlin, Germany: Springer-Verlag, 2011, pp. 176–190.
      Google ScholarLocate open access versionFindings
    • J. Bergstra, R. Bardenet, Y. Bengio, and B. Kegl, ‘‘Algorithms for hyper-parameter optimization,’’ in Proc. Adv. Neural Inf. Process. Syst., 2011, pp. 2546–2554.
      Google ScholarLocate open access versionFindings
    • J. Bergstra and Y. Bengio, ‘‘Random search for hyper-parameter optimization,’’ J. Mach. Learn. Res., vol. 13, pp. 281–305, 2012.
      Google ScholarLocate open access versionFindings
    • J. Bergstra, B. Komer, C. Eliasmith, and D. Warde-Farley, ‘‘Preliminary evaluation of hyperopt algorithms on HPOLib,’’ in Proc. Int. Conf. Mach. Learn. AutoML Workshop, 2014.
      Google ScholarLocate open access versionFindings
    • J. Bergstra, D. Yamins, and D. D. Cox, ‘‘Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures,’’ in Proc. Int. Conf. Mach. Learn., 2013, pp. 115–123.
      Google ScholarLocate open access versionFindings
    • S. Bochner, Lectures on Fourier Integrals. Princeton, NJ, USA: Princeton Univ. Press, 1959.
      Google ScholarFindings
    • E. V. Bonilla, K. M. A. Chai, and C. K. I. Williams, ‘‘Multi-task Gaussian process prediction,’’ in Proc. Adv. Neural Inf. Process. Syst., 2008, pp. 153–160.
      Google ScholarLocate open access versionFindings
    • L. Bornn, G. Shaddick, and J. V. Zidek, ‘‘Modeling nonstationary processes through dimension expansion,’’ J. Amer. Stat. Soc., vol. 107, no. 497, 2012, pp. 281–289.
      Google ScholarLocate open access versionFindings
    • P. Boyle, ‘‘Gaussian processes for regression and optimisation,’’ Ph.D. dissertation, Victoria Univ. Wellington, Wellington, New Zealand, 2007.
      Google ScholarFindings
    • L. Breiman, ‘‘Random forests,’’ Mach. Learn., vol. 45, no. 1, pp. 5–32, 2001.
      Google ScholarLocate open access versionFindings
    • L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees. New York, NY, USA: Wadsworth and Brooks, 1984.
      Google ScholarFindings
    • E. Brochu, T. Brochu, and N. de Freitas, ‘‘A Bayesian interactive optimization approach to procedural animation design,’’ in Proc. ACM SIGGRAPH/Eurograph. Symp. Comput. Animat., 2010, pp. 103–112.
      Google ScholarLocate open access versionFindings
    • E. Brochu, V. M. Cora, and N. de Freitas, ‘‘A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning,’’ Dept. Comput. Sci., Univ. British Columbia, Vancouver, BC, Canada, Tech. Rep. UBC TR-2009-23, 2009.
      Google ScholarLocate open access versionFindings
    • E. Brochu, N. de Freitas, and A. Ghosh, ‘‘Active preference learning with discrete choice data,’’ in Proc. Adv. Neural Inf. Process. Syst., 2007, pp. 409–416.
      Google ScholarLocate open access versionFindings
    • S. Bubeck and N. Cesa-Bianchi, ‘‘Regret analysis of stochastic and nonstochastic multi-armed bandit problems,’’ Found. Trends Mach. Learn., vol. 5, no. 1, pp. 1–122, 2012.
      Google ScholarLocate open access versionFindings
    • S. Bubeck, R. Munos, and G. Stoltz, ‘‘Pure exploration in multi-armed bandits problems,’’ in Proc. Int. Conf. Algorithmic Learn. Theory, 2009, pp. 23–37.
      Google ScholarLocate open access versionFindings
    • S. Bubeck, R. Munos, G. Stoltz, and C. Szepesvari, ‘‘X-armed bandits,’’ J. Mach. Learn. Res., vol. 12, pp. 1655–1695, 2011.
      Google ScholarLocate open access versionFindings
    • A. D. Bull, ‘‘Convergence rates of efficient global optimization algorithms,’’ J. Mach. Learn. Res., vol. 12, pp. 2879–2904, 2011.
      Google ScholarLocate open access versionFindings
    • D. Busby, ‘‘Hierarchical adaptive experimental design for Gaussian process emulators,’’ Reliab. Eng. Syst. Safety, vol. 94, no. 7, pp. 1183–1193, Jul. 2009.
      Google ScholarLocate open access versionFindings
    • R. Calandra, J. Peters, C. E. Rasmussen, and M. P. Deisenroth, Manifold Gaussian processes for regression, 2014. [Online]. Available: arXiv:1402.5876.
      Findings
    • A. Carpentier and R. Munos, ‘‘Bandit theory meets compressed sensing for high dimensional stochastic linear bandit,’’ in Proc. 15th Int. Conf. Artif. Intell. Stat., 2012, pp. 190–198.
      Google ScholarLocate open access versionFindings
    • N. Cesa-Bianchi and G. Lugosi, Prediction, Learning, Games. New York, NY, USA: Cambridge Univ. Press, 2006.
      Google ScholarFindings
    • K. Chaloner and I. Verdinelli, ‘‘Bayesian experimental design: A review,’’ Stat. Sci., vol. 10, no. 3, pp. 273–304, 1995.
      Google ScholarLocate open access versionFindings
    • O. Chapelle and L. Li, ‘‘An empirical evaluation of Thompson sampling,’’ in Proc. Adv. Neural Inf. Process. Syst., 2011, pp. 2249–2257.
      Google ScholarLocate open access versionFindings
    • B. Chen, R. Castro, and A. Krause, ‘‘Joint optimization and variable selection of highdimensional Gaussian processes,’’ in Proc. Int. Conf. Mach. Learn., 2012, pp. 1423–1430.
      Google ScholarLocate open access versionFindings
    • S. Clark, ‘‘Parallel machine learning algorithms in bioinformatics and global optimization,’’ Ph.D. dissertation, Cornell Univ., Ithaca, NY, USA, 2012.
      Google ScholarFindings
    • E. Contal, V. Perchet, and N. Vayatis, ‘‘Gaussian process optimization with mutual information,’’ in Proc. Int. Conf. Mach. Learn., 2013, pp. 253–261.
      Google ScholarLocate open access versionFindings
    • A. Criminisi, J. Shotton, and E. Konukoglu, ‘‘Decision forests: A unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning,’’ Found. Trends Comput. Graph. Vis., vol. 7, pp. 81–227, 2011.
      Google ScholarLocate open access versionFindings
    • N. de Freitas, A. Smola, and M. Zoghi, ‘‘Exponential regret bounds for Gaussian process bandits with deterministic observations,’’ in Proc. Int. Conf. Mach. Learn., 2012, pp. 1743–1750.
      Google ScholarLocate open access versionFindings
    • M. Denil, L. Bazzani, H. Larochelle, and N. de Freitas, ‘‘Learning where to attend with deep architectures for image tracking,’’ Neural Comput., vol. 24, no. 8, pp. 2151–2184, 2012.
      Google ScholarLocate open access versionFindings
    • T. Desautels, A. Krause, and J. Burdick, ‘‘Parallelizing exploration-exploitation tradeoffs with Gaussian process bandit optimization,’’ J. Mach. Learn. Res., vol. 15, pp. 4053–4103, 2014.
      Google ScholarLocate open access versionFindings
    • J. Djolonga, A. Krause, and V. Cevher, ‘‘High dimensional Gaussian process bandits,’’ in Proc. Adv. Neural Inf. Process. Syst., 2013, pp. 1025–1033.
      Google ScholarLocate open access versionFindings
    • T. Domhan, J. T. Springenberg, and F. Hutter, ‘‘Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves,’’ in Proc. 24th Int. Joint Conf. Artif. Intell., Jul. 2015, pp. 3460–3468.
      Google ScholarLocate open access versionFindings
    • T. Domhan, T. Springenberg, and F. Hutter, ‘‘Extrapolating learning curves of deep neural networks,’’ in Proc. Int. Conf. Mach. Learn. AutoML Workshop, 2014.
      Google ScholarLocate open access versionFindings
    • M. Feurer, T. Springenberg, and F. Hutter, ‘‘Initializing Bayesian hyperparameter optimization via meta-learning,’’ in Proc. Nat. Conf. Artif. Intell., 2015, pp. 1128–1135.
      Google ScholarLocate open access versionFindings
    • V. Gabillon, M. Ghavamzadeh, and A. Lazaric, ‘‘Best arm identification: A unified approach to fixed budget and fixed confidence,’’ in Proc. Adv. Neural Inf. Process. Syst., 2012, pp. 3212–3220.
      Google ScholarLocate open access versionFindings
    • V. Gabillon, M. Ghavamzadeh, A. Lazaric, and S. Bubeck, ‘‘Multi-bandit best arm identification,’’ in Proc. Adv. Neural Inf. Process. Syst., 2011, pp. 2222–2230.
      Google ScholarLocate open access versionFindings
    • J. R. Gardner, M. J. Kusner, Z. Xu, K. Q. Weinberger, and J. P. Cunningham, ‘‘Bayesian optimization with inequality constraints,’’ in Proc. Int. Conf. Mach. Learn., 2014, pp. 937–945.
      Google ScholarLocate open access versionFindings
    • R. Garnett, M. A. Osborne, and P. Hennig, ‘‘Active learning of linear embeddings for Gaussian processes,’’ in Proc. Conf. Uncertainty Artif. Intell., 2014, pp. 24–33.
      Google ScholarLocate open access versionFindings
    • R. Garnett, M. A. Osborne, S. Reece, A. Rogers, and S. J. Roberts, ‘‘Sequential Bayesian prediction in the presence of changepoints and faults,’’ Comput. J., vol. 53, no. 9, pp. 1430–1446, 2010.
      Google ScholarLocate open access versionFindings
    • R. Garnett, M. A. Osborne, and S. J. Roberts, ‘‘Bayesian optimization for sensor set selection,’’ in Proc. ACM/IEEE Int. Conf. Inf. Process. Sensor Netw., 2010, pp. 209–219.
      Google ScholarLocate open access versionFindings
    • M. A. Gelbart, J. Snoek, and R. P. Adams, ‘‘Bayesian optimization with unknown constraints,’’ in Proc. Conf. Uncertainty Artif. Intell., 2014, pp. 250–259.
      Google ScholarLocate open access versionFindings
    • D. Ginsbourger, R. Le Riche, and L. Carraro, ‘‘Kriging is well-suited to parallelize optimization,’’ in Proc. Comput. Intell. Expensive Optim. Problems, 2010, pp. 131–162.
      Google ScholarLocate open access versionFindings
    • D. Ginsbourger and R. L. Riche, ‘‘Dealing with asynchronicity in parallel Gaussian process based global optimization,’’ 2010. [Online]. Available: http://hal.archivesouvertes.fr/hal-00507632.
      Findings
    • J. C. Gittins, ‘‘Bandit processes and dynamic allocation indices J. Roy. Stat. Soc. B, Methodol., vol. 2, pp. 148–177, 1979.
      Google ScholarLocate open access versionFindings
    • P. Goovaerts, Geostatistics for Natural Resources Evaluation. Oxford, U.K.: Oxford Univ. Press, 1997.
      Google ScholarFindings
    • R. B. Gramacy et al., ‘‘Modeling an augmented Lagrangian for improved blackbox constrained optimization,’’ 2014. [Online]. Available: arXiv:1403.4890.
      Findings
    • R. B. Gramacy and H. K. Lee, ‘‘Optimization under unknown constraints,’’ 2010. [Online]. Available: arXiv:1004.4027.
      Findings
    • R. B. Gramacy, H. K. H. Lee, and W. G. Macready, ‘‘Parameter space exploration with Gaussian process trees,’’ in Proc. Int. Conf. Mach. Learn., 2004, pp. 45–52.
      Google ScholarLocate open access versionFindings
    • S. Grunewalder, J. Audibert, M. Opper, and J. Shawe-Taylor, ‘‘Regret bounds for Gaussian process bandit problems,’’ in Proc. 13th Int. Conf. Artif. Intell. Stat., 2010, pp. 273–280.
      Google ScholarLocate open access versionFindings
    • F. Hamze, Z. Wang, and N. de Freitas, ‘‘Self-avoiding random dynamics on integer complex systems,’’ ACM Trans. Model. Comput. Simul., vol. 23, no. 1, p. 9, 2013.
      Google ScholarLocate open access versionFindings
    • N. Hansen and A. Ostermeier, ‘‘Completely derandomized self-adaptation in evolution strategies,’’ Evol. Comput, vol. 9, no. 2, pp. 159–195, 2001.
      Google ScholarLocate open access versionFindings
    • P. Hennig and C. Schuler, ‘‘Entropy search for information-efficient global optimization,’’ J. Mach. Learn. Res., vol. 13, no. 1, pp. 1809–1837, 2012.
      Google ScholarLocate open access versionFindings
    • J. M. Hernandez-Lobato, M. A. Gelbart, M. W. Hoffman, R. P. Adams, and Z. Ghahramani, ‘‘Predictive entropy search for Bayesian optimization with unknown constraints,’’ in Proc. Int. Conf. Mach. Learn., 2015, pp. 1699–1707.
      Google ScholarLocate open access versionFindings
    • J. M. Hernandez-Lobato, M. W. Hoffman, and Z. Ghahramani, ‘‘Predictive entropy search for efficient global optimization of black-box functions,’’ in Proc. Adv. Neural Inf. Process. Syst., 2014, pp. 918–926.
      Google ScholarLocate open access versionFindings
    • D. Higdon, J. Swall, and J. Kern, ‘‘Non-stationary spatial modeling,’’ Bayesian Stat., vol. 6, 1998, pp. 761–768.
      Google ScholarLocate open access versionFindings
    • G. E. Hinton and R. Salakhutdinov, ‘‘Using deep belief nets to learn covariance kernels for Gaussian processes,’’ in Proc. Adv. Neural Inf. Process. Syst., 2008, pp. 1249–1256.
      Google ScholarLocate open access versionFindings
    • M. Hoffman, B. Shahriari, and N. de Freitas, ‘‘On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning,’’ in Proc. 17th Int. Conf. Artif. Intell. Stat., 2014, pp. 365–374.
      Google ScholarLocate open access versionFindings
    • M. W. Hoffman, E. Brochu, and N. de Freitas, ‘‘Portfolio allocation for Bayesian optimization,’’ in Proc. Conf. Uncertainty Artif. Intell., 2011, pp. 327–336.
      Google ScholarLocate open access versionFindings
    • H. H. Hoos, ‘‘Programming by optimization,’’ Commun. ACM, vol. 55, no. 2, pp. 70–80, 2012.
      Google ScholarLocate open access versionFindings
    • D. Huang, T. Allen, W. Notz, and N. Zeng, ‘‘Global optimization of stochastic black-box systems via sequential Kriging meta-models,’’ J. Global Optim., vol. 34, no. 3, pp. 441–466, 2006.
      Google ScholarLocate open access versionFindings
    • F. Hutter, ‘‘Automated configuration of algorithms for solving hard computational problems,’’ Ph.D. dissertation, Univ. British Columbia, Vancouver, BC, Canada, 2009.
      Google ScholarFindings
    • F. Hutter, H. Hoos, and K. Leyton-Brown, ‘‘Identifying key algorithm parameters and instance features using forward selection,’’ in Learning and Intelligent Optimization, vol. 7997, Berlin, Germany: Springer-Verlag, 2013, pp. 364–381.
      Google ScholarLocate open access versionFindings
    • F. Hutter, H. H. Hoos, and K. Leyton-Brown, ‘‘Automated configuration of mixed integer programming solvers,’’ in Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems, Berlin, Germany: Springer-Verlag, 2010, pp. 186–202.
      Google ScholarFindings
    • F. Hutter, H. H. Hoos, and K. Leyton-Brown, ‘‘Sequential model-based optimization for general algorithm configuration,’’ Learning and Intelligent Optimization, Berlin, Germany: Springer-Verlag, 2011, pp. 507–523.
      Google ScholarFindings
    • F. Hutter, H. H. Hoos, and K. Leyton-Brown, ‘‘Parallel algorithm configuration,’’ Learning and Intelligent Optimization, Berlin, Germany: Springer-Verlag, 2012, pp. 55–70.
      Google ScholarFindings
    • D. Jones, ‘‘A taxonomy of global optimization methods based on response surfaces,’’ J. Global Optim., vol. 21, no. 4, pp. 345–383, 2001.
      Google ScholarLocate open access versionFindings
    • D. Jones, M. Schonlau, and W. Welch, ‘‘Efficient global optimization of expensive black-box functions,’’ J. Global Optim., vol. 13, no. 4, pp. 455–492, 1998.
      Google ScholarLocate open access versionFindings
    • D. R. Jones, C. D. Perttunen, and B. E. Stuckman, ‘‘Lipschitzian optimization without the Lipschitz constant,’’ J. Optim. Theory Appl., vol. 79, no. 1, pp. 157–181, 1993.
      Google ScholarLocate open access versionFindings
    • E. Kaufmann, O. Cappe, and A. Garivier, ‘‘On Bayesian upper confidence bounds for bandit problems,’’ in Proc. Int. Conf. Artif. Intell. Stat., 2012, pp. 592–600.
      Google ScholarLocate open access versionFindings
    • E. Kaufmann, N. Korda, and R. Munos, ‘‘Thompson sampling: An asymptotically optimal finite-time analysis,’’ in Algorithmic Learning Theory, vol. 7568, Berlin, Germany: Springer-Verlag, 2012, pp. 199–213.
      Google ScholarLocate open access versionFindings
    • K. Kersting, C. Plagemann, P. Pfaff, and W. Burgard, ‘‘Most likely heteroscedastic Gaussian process regression,’’ in Proc. Int. Conf. Mach. Learn., 2007, pp. 393–400.
      Google ScholarLocate open access versionFindings
    • L. Kocsis and C. Szepesvari, ‘‘Bandit based Monte-Carlo planning,’’ in Proc. Eur. Conf. Mach. Learn., 2006, pp. 282–293.
      Google ScholarLocate open access versionFindings
    • R. Kohavi, R. Longbotham, D. Sommerfield, and R. M. Henne, ‘‘Controlled experiments on the web: Survey and practical guide,’’ Data Mining Knowl. Disc., vol. 18, no. 1, pp. 140–181, 2009.
      Google ScholarLocate open access versionFindings
    • A. Krause and C. S. Ong, ‘‘Contextual Gaussian process bandit optimization,’’ in Proc. Adv. Neural Inf. Process. Syst., 2011, pp. 2447–2455.
      Google ScholarLocate open access versionFindings
    • D. G. Krige, ‘‘A statistical approach to some basic mine valuation problems on the witwatersrand,’’ in J. Chem. Metallurgical Mining Soc. South Africa, vol. 94, no. 3, 1951, pp. 95–111.
      Google ScholarLocate open access versionFindings
    • H. J. Kushner, ‘‘A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise,’’ J. Fluids Eng., vol. 86, no. 1, pp. 97–106, 1964.
      Google ScholarLocate open access versionFindings
    • T. L. Lai and H. Robbins, ‘‘Asymptotically efficient adaptive allocation rules,’’ Adv. Appl. Math., vol. 6, no. 1, pp. 4–22, 1985.
      Google ScholarLocate open access versionFindings
    • M. Lazaro-Gredilla and A. R. Figueiras-Vidal, ‘‘Marginalized neural network mixtures for large-scale regression,’’ IEEE Trans. Neural Netw., vol. 21, no. 8, pp. 1345–1351, Aug. 2010.
      Google ScholarLocate open access versionFindings
    • M. Lazaro-Gredilla, J. Quinnonero-Candela, C. E. Rasmussen, and A. R. Figueiras-Vidal, ‘‘Sparse spectrum Gaussian process regression,’’ J. Mach. Learn. Res., vol. 11, pp. 1865–1881, 2010.
      Google ScholarLocate open access versionFindings
    • Q. V. Le, A. J. Smola, and S. Canu, ‘‘Heteroscedastic Gaussian process regression,’’ in Proc. Int. Conf. Mach. Learn., 2005, pp. 489–496.
      Google ScholarLocate open access versionFindings
    • K. Leyton-Brown, E. Nudelman, and Y. Shoham, ‘‘Learning the empirical hardness of optimization problems: The case of combinatorial auctions,’’ in Principles and Practice of Constraint Programming, ser. Lecture Notes in Computer Science, Berlin, Germany: Springer-Verlag, 2002, pp. 556–572.
      Google ScholarLocate open access versionFindings
    • L. Li, W. Chu, J. Langford, and R. E. Schapire, ‘‘A contextual-bandit approach to personalized news article recommendation,’’ in Proc. World Wide Web, 2010, pp. 661–670.
      Google ScholarLocate open access versionFindings
    • D. V. Lindley, ‘‘On a measure of the information provided by an experiment,’’ Ann. Math. Stat., vol. 27, no. 4, pp. 986–1005, 1956.
      Google ScholarLocate open access versionFindings
    • D. Lizotte, ‘‘Practical Bayesian optimization,’’ Ph.D. dissertation, Univ. Alberta, Edmonton, AB, Canada, 2008.
      Google ScholarFindings
    • D. Lizotte, R. Greiner, and D. Schuurmans, ‘‘An experimental methodology for response surface optimization methods,’’ J. Global Optim., vol. 53, pp. 1–38, 2011.
      Google ScholarLocate open access versionFindings
    • D. Lizotte, T. Wang, M. Bowling, and D. Schuurmans, ‘‘Automatic gait optimization with Gaussian process regression,’’ in Proc. Int. Joint Conf. Artif. Intell., 2007, pp. 944–949.
      Google ScholarLocate open access versionFindings
    • M. Locatelli, ‘‘Bayesian algorithms for one-dimensional global optimization,’’ J. Global Optim., vol. 10, pp. 57–76, 1997.
      Google ScholarLocate open access versionFindings
    • M. Lzaro-gredilla and M. K. Titsias, ‘‘Variational heteroscedastic Gaussian process regression,’’ in Proc. Int. Conf. Mach. Learn., 2011, pp. 841–848, ACM.
      Google ScholarLocate open access versionFindings
    • O. Madani, D. Lizotte, and R. Greiner, ‘‘Active model selection,’’ in Proc. Conf. Uncertainty Artif. Intell., 2004, pp. 357–365.
      Google ScholarLocate open access versionFindings
    • N. Mahendran, Z. Wang, F. Hamze, and N. de Freitas, ‘‘Adaptive MCMC with Bayesian optimization,’’ J. Mach. Learn. Res., vol. 22, pp. 751–760, 2012.
      Google ScholarLocate open access versionFindings
    • R. Marchant and F. Ramos, ‘‘Bayesian optimisation for intelligent environmental monitoring,’’ in NIPS Workshop Bayesian Optim. Decision Making, 2012.
      Google ScholarLocate open access versionFindings
    • O. Maron and A. W. Moore, ‘‘Hoeffding races: Accelerating model selection search for classification and function approximation,’’ Robot. Inst., vol. 6, pp. 59–66, 1993.
      Google ScholarLocate open access versionFindings
    • R. Martinez-Cantin, N. de Freitas, E. Brochu, J. Castellanos, and A. Doucet, ‘‘A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot,’’ Autonom. Robots, vol. 27, no. 2, pp. 93–103, 2009.
      Google ScholarLocate open access versionFindings
    • J. Martinez, J. J. Little, and N. de Freitas, ‘‘Bayesian optimization with an empirical hardness model for approximate nearest neighbour search,’’ in Proc. IEEE Winter Conf. Appl. Comput. Vis., 2014, pp. 588–595.
      Google ScholarLocate open access versionFindings
    • R. Martinez-Cantin, N. de Freitas, A. Doucet, and J. A. Castellanos, ‘‘Active policy learning for robot planning and exploration under uncertainty,’’ in Proc. Robot. Sci. Syst., pp. 321–328, 2007.
      Google ScholarLocate open access versionFindings
    • G. Matheron, ‘‘The theory of regionalized variables and its applications,’’ in Cahier du Centre de Morphologie Mathematique, Ecoles des Mines, 1971.
      Google ScholarFindings
    • B. C. May, N. Korda, A. Lee, and D. S. Leslie, ‘‘Optimistic Bayesian sampling in contextual bandit problems,’’ Stat. Group, Schl. Math., Univ. Bristol, Bristol, U.K., Tech. Rep. 11:01, 2011.
      Google ScholarLocate open access versionFindings
    • V. Mnih, C. Szepesvari, and J.-Y. Audibert, ‘‘Empirical Bernstein stopping,’’ in Proc. Int. Conf. Mach. Learn., 2008, pp. 672–679.
      Google ScholarLocate open access versionFindings
    • J. Mockus, ‘‘Application of Bayesian approach to numerical methods of global and stochastic optimization,’’ J. Global Optim., vol. 4, no. 4, pp. 347–365, 1994.
      Google ScholarLocate open access versionFindings
    • J. Mockus, V. Tiesis, and A. Zilinskas, ‘‘The application of Bayesian methods for seeking the extremum,’’ in Toward Global Optimization, vol. 2, L. Dixon and G. Szego, Eds. Amsterdam, The Netherlands: Elsevier, 1978.
      Google ScholarLocate open access versionFindings
    • R. Munos, ‘‘Optimistic optimization of a deterministic function without the knowledge of its smoothness,’’ in Proc. Adv. Neural Inf. Process. Syst., 2011, pp. 783–791.
      Google ScholarLocate open access versionFindings
    • R. Munos, ‘‘From bandits to Monte-Carlo tree search: The optimistic principle applied to optimization and planning,’’ INRIA Lille, France, Tech. Rep. hal-00747575, 2014.
      Google ScholarLocate open access versionFindings
    • R. M. Neal, ‘‘Bayesian learning for neural networks,’’ Ph.D. dissertation, Univ. Toronto, Toronto, ON, Canada, 1995.
      Google ScholarFindings
    • J. Nelder and R. Wedderburn, ‘‘Generalized linear models,’’ J. Roy. Stat. Soc. A, vol. 135, no. 3, pp. 370–384, 1972.
      Google ScholarLocate open access versionFindings
    • M. A. Osborne, R. Garnett, and S. J. Roberts, ‘‘Gaussian processes for global optimisation Learning and Intelligent Optimization, Berlin, Germany: Springer-Verlag, pp. 1–15, 2009.
      Google ScholarFindings
    • C. Paciorek and M. Schervish, ‘‘Nonstationary covariance functions for Gaussian process regression,’’ in Proc. Adv. Neural Inf. Process. Syst., 2004, vol. 16, pp. 273–280.
      Google ScholarLocate open access versionFindings
    • V. Picheny and D. Ginsbourger, ‘‘A nonstationary space-time Gaussian process model for partially converged simulations,’’ SIAM/ASA J. Uncertainty Quantif., vol. 1, no. 1, pp. 57–78, 2013.
      Google ScholarLocate open access versionFindings
    • J. C. Pinheiro and D. M. Bates, ‘‘Unconstrained parametrizations for variance-covariance matrices,’’ Stat. Comput., vol. 6, no. 3, pp. 289–296, 1996.
      Google ScholarLocate open access versionFindings
    • J. Qui nonero-Candela and C. E. Rasmussen, ‘‘A unifying view of sparse approximate Gaussian process regression,’’ J. Mach. Learn. Res., vol. 6, pp. 1939–1959, 2005.
      Google ScholarLocate open access versionFindings
    • A. Rahimi and B. Recht, ‘‘Random features for large-scale kernel machines,’’ in Proc. Adv. Neural Inf. Process. Syst., 2007, pp. 1177–1184.
      Google ScholarLocate open access versionFindings
    • C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning. Cambridge, MA, USA: MIT Press, 2006.
      Google ScholarFindings
    • D. Russo and B. Van Roy, ‘‘Learning to optimize via posterior sampling,’’ Math. Oper. Res., vol. 39, no. 4, pp. 1221–1243, 2014.
      Google ScholarLocate open access versionFindings
    • J. Sacks, W. J. Welch, T. J. Welch, and H. P. Wynn, ‘‘Design and analysis of computer experiments,’’ Stat. Sci., vol. 4, no. 4, pp. 409–423, 1989.
      Google ScholarLocate open access versionFindings
    • P. D. Sampson and P. Guttorp, ‘‘Nonparametric estimation of nonstationary spatial covariance structure,’’ J. Amer. Stat. Assoc., vol. 87, no. 417, pp. 108–119, 1992.
      Google ScholarLocate open access versionFindings
    • T. J. Santner, B. Williams, and W. Notz, The Design and Analysis of Computer Experiments. New York, NY, USA: Springer-Verlag, 2003.
      Google ScholarFindings
    • M. J. Sasena, ‘‘Flexibility and efficiency enhancement for constrained global design optimization with Kriging approximations,’’ Ph.D. dissertation, Univ. Michigan, Ann Arbor, MI, USA, 2002.
      Google ScholarFindings
    • A. M. Schmidt and A. O’Hagan, ‘‘Bayesian inference for nonstationary spatial covariance structures via spatial deformations,’’ J. Roy. Stat. Soc. B, vol. 65, pp. 743–758, 2003.
      Google ScholarLocate open access versionFindings
    • M. Schonlau, ‘‘Computer experiments and global optimization,’’ Ph.D. dissertation, Univ. Waterloo, Waterloo, ON, Canada, 1997.
      Google ScholarFindings
    • M. Schonlau, W. J. Welch, and D. R. Jones, ‘‘Global versus local search in constrained optimization of computer models,’’ Lecture Notes-Monograph Series, vol. 34, pp. 11–25, 1998.
      Google ScholarLocate open access versionFindings
    • S. L. Scott, ‘‘A modern Bayesian look at the multi-armed bandit,’’ Appl. Stochastic Models Business Ind., vol. 26, no. 6, pp. 639–658, 2010.
      Google ScholarLocate open access versionFindings
    • M. Seeger, Y.-W. Teh, and M. I. Jordan, ‘‘Semiparametric latent factor models,’’ in Proc. Int. Conf. Artif. Intell. Stat., 2005, pp. 333–340.
      Google ScholarLocate open access versionFindings
    • M. Seeger, C. Williams, and N. Lawrence, ‘‘Fast forward selection to speed up sparse Gaussian process regression,’’ in Proc. Artif. Intell. Stat. 9, 2003, pp. 1–8.
      Google ScholarLocate open access versionFindings
    • A. Shah, A. G. Wilson, and Z. Ghahramani, ‘‘Student-t processes as alternatives to Gaussian processes,’’ in Proc. Int. Conf. Artif. Intell. Stat., 2014, pp. 877–885.
      Google ScholarLocate open access versionFindings
    • B. Shahriari, Z. Wang, M. W. Hoffman, A. Bouchard-Cote, and N. de Freitas, ‘‘An entropy search portfolio,’’ in Proc. NIPS Workshop Bayesian Optim., 2014.
      Google ScholarLocate open access versionFindings
    • E. Snelson and Z. Ghahramani, ‘‘Sparse Gaussian processes using pseudo-inputs,’’ in Proc. Adv. Neural Inf. Process. Syst., 2005, pp. 1257–1264.
      Google ScholarLocate open access versionFindings
    • E. Snelson, C. E. Rasmussen, and Z. Ghahramani, ‘‘Warped Gaussian processes,’’ in Proc. Adv. Neural Inf. Process. Syst., 2003, pp. 337–344.
      Google ScholarLocate open access versionFindings
    • J. Snoek, ‘‘Bayesian optimization and semiparametric models with applications to assistive technology,’’ Ph.D. dissertation, Univ. Toronto, Toronto, ON, Canada, 2013.
      Google ScholarFindings
    • J. Snoek, H. Larochelle, and R. P. Adams, ‘‘Practical Bayesian optimization of machine learning algorithms,’’ in Proc. Adv. Neural Inf. Process. Syst., 2012, pp. 2951–2959.
      Google ScholarLocate open access versionFindings
    • J. Snoek et al., ‘‘Scalable Bayesian optimization using deep neural networks,’’ in Proc. Int. Conf. Mach. Learn., 2015, pp. 2171–2180.
      Google ScholarLocate open access versionFindings
    • J. Snoek, K. Swersky, R. S. Zemel, and R. P. Adams, ‘‘Input warping for Bayesian optimization of non-stationary functions,’’ in Proc. Int. Conf. Mach. Learn., 2014, pp. 1674–1682.
      Google ScholarLocate open access versionFindings
    • N. Srinivas, A. Krause, S. M. Kakade, and M. Seeger, ‘‘Gaussian process optimization in the bandit setting: No regret and experimental design,’’ in Proc. Int. Conf. Mach. Learn., 2010, pp. 1015–1022.
      Google ScholarLocate open access versionFindings
    • K. Swersky, D. Duvenaud, J. Snoek, F. Hutter, and M. A. Osborne, ‘‘Raiders of the lost architecture: Kernels for Bayesian optimization in conditional parameter spaces,’’ 2014. [Online]. Available: arXiv: 1409.4011.
      Findings
    • K. Swersky, J. Snoek, and R. P. Adams, ‘‘Multi-task Bayesian optimization,’’ in Proc. Adv. Neural Inf. Process. Syst., 2013, pp. 2004–2012.
      Google ScholarLocate open access versionFindings
    • K. Swersky, J. Snoek, and R. P. Adams, ‘‘Freeze-thaw Bayesian optimization,’’ 2014. [Online]. Available: arXiv:1406.3896.
      Findings
    • W. R. Thompson, ‘‘On the likelihood that one unknown probability exceeds another in view of the evidence of two samples,’’ Biometrika, vol. 25, no. 3/4, pp. 285–294, 1933.
      Google ScholarLocate open access versionFindings
    • C. Thornton, F. Hutter, H. H. Hoos, and K. Leyton-Brown, ‘‘Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms,’’ in Proc. Knowl. Disc. Data Mining, 2013, pp. 847–855.
      Google ScholarLocate open access versionFindings
    • M. K. Titsias, ‘‘Variational learning of inducing variables in sparse Gaussian processes,’’ in Proc. Int. Conf. Artif. Intell. Stat., 2009, pp. 567–574.
      Google ScholarLocate open access versionFindings
    • H. P. Vanchinathan, I. Nikolic, F. De Bona, and A. Krause, ‘‘Explore-exploit in top-N recommender systems via Gaussian processes,’’ in Proc. 8th ACM Conf. Recommender Syst., 2014, pp. 225–232.
      Google ScholarLocate open access versionFindings
    • E. Vazquez and J. Bect, ‘‘Convergence properties of the expected improvement algorithm with fixed mean and covariance functions,’’ J. Stat. Planning Inference, vol. 140, no. 11, pp. 3088–3095, 2010.
      Google ScholarLocate open access versionFindings
    • J. Villemonteix, E. Vazquez, and E. Walter, ‘‘An informational approach to the global optimization of expensive-to-evaluate functions,’’ J. Global Optim., vol. 44, no. 4, pp. 509–534, 2009.
      Google ScholarLocate open access versionFindings
    • Z. Wang and N. de Freitas, ‘‘Theoretical analysis of Bayesian optimisation with unknown Gaussian process hyper-parameters,’’ 2014. [Online]. Available: arXiv:1406.7758.
      Findings
    • Z. Wang, B. Shakibi, L. Jin, and N. de Freitas, ‘‘Bayesian multi-scale optimistic optimization,’’ in Proc. Int. Conf. Artif. Intell. Stat., 2014, pp. 1005–1014.
      Google ScholarLocate open access versionFindings
    • Z. Wang, M. Zoghi, D. Matheson, F. Hutter, and N. de Freitas, ‘‘Bayesian optimization in high dimensions via random embeddings,’’ in Proc. Int. Joint Conf. Artif. Intell., 2013, pp. 1778–1784.
      Google ScholarLocate open access versionFindings
    • B. J. Williams, T. J. Santner, and W. I. Notz, ‘‘Sequential design of computer experiments to minimize integrated response functions,’’ Statistica Sinica, vol. 10, pp. 1133–1152, 2000.
      Google ScholarLocate open access versionFindings
    • D. Yogatama and G. Mann, ‘‘Efficient transfer learning method for automatic hyperparameter tuning,’’ in Proc. Int. Conf. Artif. Intell. Stat., 2014, pp. 1077–1085.
      Google ScholarLocate open access versionFindings
    • D. Yogatama and N. A. Smith, ‘‘Bayesian optimization of text representations,’’ 2015. [Online]. Available: arXiv:1503.00693.
      Findings
    • Y. Zhang, K. Sohn, R. Villegas, G. Pan, and H. Lee, ‘‘Improving object detection with deep convolutional networks via Bayesian optimization and structured prediction,’’ in Proc. IEEE Comput. Vis. Pattern Recognit. Conf., 2015, pp. 249–258.
      Google ScholarLocate open access versionFindings
    • A. Zilinskas and J. Zilinskas, ‘‘Global optimization based on a statistical model and simplical partitioning,’’ Comput. Math. Appl., vol. 44, pp. 957–967, 2002. Ryan P. Adams received the Ph.D. degree in physics from the University of Cambridge, Cambridge, U.K., in 2009.
      Google ScholarLocate open access versionFindings
    • Nando de Freitas received the Ph.D. degree in Bayesian methods for neural networks from Trinity College, Cambridge University, Cambridge, U.K., in 2000.
      Google ScholarFindings
    • He is a Machine Learning Professor at Oxford University, Oxford, U.K. and a Senior Staff Research Scientist at Google DeepMind, U.K. From 1999 to 2001, he was a Postdoctoral Fellow at the University of California Berkeley, Berkeley, CA, USA, in the artificial intelligence group. He was a Professor at the University of British Columbia, Vancouver, BC, Canada, from 2001 to 2013.
      Google ScholarLocate open access versionFindings
    • Prof. de Freitas is a Fellow of the Canadian Institute For Advanced Research (CIFAR) in the successful Neural Computation and Adaptive Perception program. Among his recent awards are the 2012 Charles A. McDowell Award for Excellence in Research and the 2010 Mathematics of Information Technology and Complex Systems (MITACS) Young Researcher Award.
      Google ScholarLocate open access versionFindings
    Your rating :
    0

     

    Tags
    Comments