# Bayesian optimization in high dimensions via random embeddings

IJCAI, pp. 1778-1784, 2013.

EI

Keywords:

high-dimensional problemrandom embedding bayesian optimizationautomatic algorithm configurationcontinuous variableholy grailMore(6+)

Weibo:

Abstract:

Bayesian optimization techniques have been successfully applied to robotics, planning, sensor placement, recommendation, advertising, intelligent user interfaces and automatic algorithm configuration. Despite these successes, the approach is restricted to problems of moderate dimension, and several workshops on Bayesian optimization have ...More

Code:

Data:

Introduction

- Let f : X → R be a function on a compact subset X ⊆ RD. The authors address the following global optimization problem x = arg max f (x). x∈X

The authors are interested in objective functions f that may satisfy one or more of the following criteria: they do not have a closed-form expression, are expensive to evaluate, do not have available derivatives, or are non-convex. - In order to optimize a blackbox function f , Bayesian optimization uses a prior distribution that captures the beliefs about the behavior of f , and updates this prior with sequentially acquired data
- It iterates the following phases: (1) use the prior to decide at which input x ∈ X to query f ; (2) evaluate f (x); and (3) update the prior based on the new data x, f (x).
- Bayesian optimization methods differ in their choice of prior and their choice of this acquisition function

Highlights

- Let f : X → R be a function on a compact subset X ⊆ RD
- Bayesian optimization methods differ in their choice of prior and their choice of this acquisition function
- We evaluate Random EMbedding Bayesian Optimization using a fixed budget of 500 function evaluations that is spread across multiple interleaved runs — for example, when using k = 4 interleaved Random EMbedding Bayesian Optimization runs, each of them was only allowed 125 function evaluations
- This paper has shown that it is possible to use random embeddings in Bayesian optimization to optimize functions of high extrinsic dimensionality D, provided that they have low intrinsic dimensionality de
- We confirmed Random EMbedding Bayesian Optimization’s independence of D empirically by optimizing low-dimensional functions embedded in high dimensions
- We do not yet know how many practical optimization problems fall within the class of problems where Random EMbedding Bayesian Optimization applies

Methods

- The authors used a single robust version of REMBO that automatically sets its GP’s length scale parameter using a variant of maximum likelihood.
- 4.1 Bayesian Optimization in a Billion Dimensions.
- The experiments employ a standard de = 2dimensional benchmark function for Bayesian optimization, embedded in a D-dimensional space.
- The function whose optimum the authors seek is f (x1:D) = g, where g is the Branin function and where i and j are selected once using a random permutation.
- To measure the performance of each optimization method, the authors used the optimality gap: the difference of the best function value it found and the optimal function value

Conclusion

- This paper has shown that it is possible to use random embeddings in Bayesian optimization to optimize functions of high extrinsic dimensionality D, provided that they have low intrinsic dimensionality de.
- The authors demonstrated that REMBO achieves excellent performance for optimizing the 47 discrete parameters of a popular mixed integer programming solver, thereby providing further evidence for the observation that, for many problems of great practical interest, the number of important dimensions appears to be much lower than their extrinsic dimensionality.
- The success achieved in the examples presented in this paper is very encouraging

Summary

## Introduction:

Let f : X → R be a function on a compact subset X ⊆ RD. The authors address the following global optimization problem x = arg max f (x). x∈X

The authors are interested in objective functions f that may satisfy one or more of the following criteria: they do not have a closed-form expression, are expensive to evaluate, do not have available derivatives, or are non-convex.- In order to optimize a blackbox function f , Bayesian optimization uses a prior distribution that captures the beliefs about the behavior of f , and updates this prior with sequentially acquired data
- It iterates the following phases: (1) use the prior to decide at which input x ∈ X to query f ; (2) evaluate f (x); and (3) update the prior based on the new data x, f (x).
- Bayesian optimization methods differ in their choice of prior and their choice of this acquisition function
## Methods:

The authors used a single robust version of REMBO that automatically sets its GP’s length scale parameter using a variant of maximum likelihood.- 4.1 Bayesian Optimization in a Billion Dimensions.
- The experiments employ a standard de = 2dimensional benchmark function for Bayesian optimization, embedded in a D-dimensional space.
- The function whose optimum the authors seek is f (x1:D) = g, where g is the Branin function and where i and j are selected once using a random permutation.
- To measure the performance of each optimization method, the authors used the optimality gap: the difference of the best function value it found and the optimal function value
## Conclusion:

This paper has shown that it is possible to use random embeddings in Bayesian optimization to optimize functions of high extrinsic dimensionality D, provided that they have low intrinsic dimensionality de.- The authors demonstrated that REMBO achieves excellent performance for optimizing the 47 discrete parameters of a popular mixed integer programming solver, thereby providing further evidence for the observation that, for many problems of great practical interest, the number of important dimensions appears to be much lower than their extrinsic dimensionality.
- The success achieved in the examples presented in this paper is very encouraging

- Table1: Optimality gap for de = 2-dimensional Branin function embedded in D = 25 dimensions, for REMBO variants using a total of 500 function evaluations. The variants differed in the internal dimensionality d and in the number of interleaved runs k (each such run was only allowed 500/k function evaluations). We show mean and standard deviations of the optimality gap achieved after 500 function evaluations

Reference

- [Azimi et al., 2012] J. Azimi, A. Jalali, and X. Zhang-Fern. Hybrid batch Bayesian optimization. In ICML, 2012.
- [Bergstra and Bengio, 2012] J. Bergstra and Y. Bengio. Random search for hyper-parameter optimization. JMLR, 13:281–305, 2012.
- [Bergstra et al., 2011] J. Bergstra, R. Bardenet, Y. Bengio, and B. Kegl. Algorithms for hyper-parameter optimization. In NIPS, pages 2546–2554, 2011.
- [Bergstra et al., 2012] J. Bergstra, D. Yamins, and D. D. Cox. Making a science of model search. CoRR, abs/1209.5111, 2012.
- [Brochu et al., 2009] E. Brochu, V. M. Cora, and N. de Freitas. A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. Technical Report UBC TR-2009-23 and arXiv:1012.2599v1, Dept. of Computer Science, University of British Columbia, 2009.
- [Bull, 2011] A. D. Bull. Convergence rates of efficient global optimization algorithms. JMLR, 12:2879–2904, 2011.
- [Carpentier and Munos, 2012] A. Carpentier and R. Munos. Bandit theory meets compressed sensing for high dimensional stochastic linear bandit. In AIStats, pages 190–198, 2012.
- [Chen et al., 2012] B. Chen, R.M. Castro, and A. Krause. Joint optimization and variable selection of high-dimensional Gaussian processes. In ICML, 2012.
- [de Freitas et al., 2012] N. de Freitas, A. Smola, and M. Zoghi. Exponential regret bounds for Gaussian process bandits with deterministic observations. In ICML, 2012.
- [Gomes et al., 2008] C. P. Gomes, W.J. van Hoeve, and A. Sabharwal. Connections in networks: A hybrid approach. In CPAIOR, volume 5015, pages 303–307, 2008.
- [Hansen and Ostermeier, 2001] N. Hansen and A. Ostermeier. Completely derandomized self-adaptation in evolution strategies. Evol. Comput., 9(2):159–195, 2001.
- [Hoffman et al., 2011] M. Hoffman, E. Brochu, and N. de Freitas. Portfolio allocation for Bayesian optimization. In UAI, pages 327–336, 2011.
- [Hoos, 2012] H. H. Hoos. Programming by optimization. Commun. ACM, 55(2):70–80, 2012.
- [Hutter et al., 2010] F. Hutter, H. H. Hoos, and K. Leyton-Brown. Automated configuration of mixed integer programming solvers. In CPAIOR, pages 186–202, 2010.
- [Hutter et al., 2011] F. Hutter, H. H. Hoos, and K. Leyton-Brown. Sequential model-based optimization for general algorithm configuration. In LION, pages 507–523, 2011.
- [Hutter et al., 2012] F. Hutter, H. H. Hoos, and K. Leyton-Brown. Parallel algorithm configuration. In LION, pages 55–70, 2012.
- [Hutter et al., 2013] F. Hutter, H. Hoos, and K. Leyton-Brown. Identifying key algorithm parameters and instance features using forward selection. In LION, 2013.
- [Hutter, 2009] F. Hutter. Automated Configuration of Algorithms for Solving Hard Computational Problems. PhD thesis, University of British Columbia, Vancouver, Canada, 2009.
- [Jones et al., 1993] David R Jones, C D Perttunen, and B E Stuckman. Lipschitzian optimization without the Lipschitz constant. J. of Optimization Theory and Applications, 79(1):157–181, 1993.
- [Jones et al., 1998] D.R. Jones, M. Schonlau, and W.J. Welch. Efficient global optimization of expensive black-box functions. J. of Global optimization, 13(4):455–492, 1998.
- [Jones, 2001] D.R. Jones. A taxonomy of global optimization methods based on response surfaces. J. of Global Optimization, 21(4):345–383, 2001.
- [Lizotte et al., 2011] D. Lizotte, R. Greiner, and D. Schuurmans. An experimental methodology for response surface optimization methods. J. of Global Optimization, pages 1–38, 2011.
- [Lizotte, 2008] D. Lizotte. Practical Bayesian Optimization. PhD thesis, University of Alberta, Canada, 2008.
- [Martinez–Cantin et al., 2009] R. Martinez–Cantin, N. de Freitas, E. Brochu, J. Castellanos, and A. Doucet. A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot. Autonomous Robots, 27(2):93–103, 2009.
- [Mockus et al., 1999] J. Mockus, A. Mockus, and L. Mockus. Bayesian approach for randomization of heuristic algorithms of discrete programming. American Math. Society, 1999.
- [Mockus, 1982] J. Mockus. The Bayesian approach to global optimization. In Systems Modeling and Optimization, volume 38, pages 473–481.
- [Mockus, 1994] J. Mockus. Application of Bayesian approach to numerical methods of global and stochastic optimization. J. of Global Optimization, 4(4):347–365, 1994.
- [Osborne et al., 2009] M. A. Osborne, R. Garnett, and S. J. Roberts. Gaussian processes for global optimisation. In LION, 2009.
- [Rasmussen and Williams, 2006] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. The MIT Press, 2006.
- [Sankar et al., 2003] A. Sankar, D.A. Spielman, and S.H. Teng. Smoothed analysis of the condition numbers and growth factors of matrices. Arxiv preprint cs/0310022, 2003.
- [Snoek et al., 2012] J. Snoek, H. Larochelle, and R. P. Adams. Practical Bayesian optimization of machine learning algorithms. In NIPS, 2012.
- [Srinivas et al., 2010] N. Srinivas, A. Krause, S. M. Kakade, and M. Seeger. Gaussian process optimization in the bandit setting: No regret and experimental design. In ICML, 2010.
- [Vallati et al., 2011] M. Vallati, C. Fawcett, A. E. Gerevini, H. H. Hoos, and A. Saetti. Generating fast domain-optimized planners by automatically configuring a generic parameterised planner. In ICAPS-PAL, 2011.
- [Wang and de Freitas, 2011] Z. Wang and N. de Freitas. Predictive adaptation of hybrid Monte Carlo with Bayesian parametric bandits. In NIPS Deep Learning and Unsupervised Feature Learning Workshop, 2011.
- [Wang et al., 2013] Z. Wang, M. Zoghi, F. Hutter, D. Matheson, and N. de Freitas. Bayesian Optimization in a Billion Dimensions via Random Embeddings. ArXiv e-prints, January 2013.

Best Paper

Best Paper of IJCAI, 2013

Tags

Comments