Bayesian optimization in high dimensions via random embeddings

IJCAI, pp. 1778-1784, 2013.

Cited by: 161|Bibtex|Views189|Links
EI
Keywords:
high-dimensional problemrandom embedding bayesian optimizationautomatic algorithm configurationcontinuous variableholy grailMore(6+)
Weibo:
This paper has shown that it is possible to use random embeddings in Bayesian optimization to optimize functions of high extrinsic dimensionality D, provided that they have low intrinsic dimensionality de

Abstract:

Bayesian optimization techniques have been successfully applied to robotics, planning, sensor placement, recommendation, advertising, intelligent user interfaces and automatic algorithm configuration. Despite these successes, the approach is restricted to problems of moderate dimension, and several workshops on Bayesian optimization have ...More

Code:

Data:

0
Introduction
  • Let f : X → R be a function on a compact subset X ⊆ RD. The authors address the following global optimization problem x = arg max f (x). x∈X

    The authors are interested in objective functions f that may satisfy one or more of the following criteria: they do not have a closed-form expression, are expensive to evaluate, do not have available derivatives, or are non-convex.
  • In order to optimize a blackbox function f , Bayesian optimization uses a prior distribution that captures the beliefs about the behavior of f , and updates this prior with sequentially acquired data
  • It iterates the following phases: (1) use the prior to decide at which input x ∈ X to query f ; (2) evaluate f (x); and (3) update the prior based on the new data x, f (x).
  • Bayesian optimization methods differ in their choice of prior and their choice of this acquisition function
Highlights
  • Let f : X → R be a function on a compact subset X ⊆ RD
  • Bayesian optimization methods differ in their choice of prior and their choice of this acquisition function
  • We evaluate Random EMbedding Bayesian Optimization using a fixed budget of 500 function evaluations that is spread across multiple interleaved runs — for example, when using k = 4 interleaved Random EMbedding Bayesian Optimization runs, each of them was only allowed 125 function evaluations
  • This paper has shown that it is possible to use random embeddings in Bayesian optimization to optimize functions of high extrinsic dimensionality D, provided that they have low intrinsic dimensionality de
  • We confirmed Random EMbedding Bayesian Optimization’s independence of D empirically by optimizing low-dimensional functions embedded in high dimensions
  • We do not yet know how many practical optimization problems fall within the class of problems where Random EMbedding Bayesian Optimization applies
Methods
  • The authors used a single robust version of REMBO that automatically sets its GP’s length scale parameter using a variant of maximum likelihood.
  • 4.1 Bayesian Optimization in a Billion Dimensions.
  • The experiments employ a standard de = 2dimensional benchmark function for Bayesian optimization, embedded in a D-dimensional space.
  • The function whose optimum the authors seek is f (x1:D) = g, where g is the Branin function and where i and j are selected once using a random permutation.
  • To measure the performance of each optimization method, the authors used the optimality gap: the difference of the best function value it found and the optimal function value
Conclusion
  • This paper has shown that it is possible to use random embeddings in Bayesian optimization to optimize functions of high extrinsic dimensionality D, provided that they have low intrinsic dimensionality de.
  • The authors demonstrated that REMBO achieves excellent performance for optimizing the 47 discrete parameters of a popular mixed integer programming solver, thereby providing further evidence for the observation that, for many problems of great practical interest, the number of important dimensions appears to be much lower than their extrinsic dimensionality.
  • The success achieved in the examples presented in this paper is very encouraging
Summary
  • Introduction:

    Let f : X → R be a function on a compact subset X ⊆ RD. The authors address the following global optimization problem x = arg max f (x). x∈X

    The authors are interested in objective functions f that may satisfy one or more of the following criteria: they do not have a closed-form expression, are expensive to evaluate, do not have available derivatives, or are non-convex.
  • In order to optimize a blackbox function f , Bayesian optimization uses a prior distribution that captures the beliefs about the behavior of f , and updates this prior with sequentially acquired data
  • It iterates the following phases: (1) use the prior to decide at which input x ∈ X to query f ; (2) evaluate f (x); and (3) update the prior based on the new data x, f (x).
  • Bayesian optimization methods differ in their choice of prior and their choice of this acquisition function
  • Methods:

    The authors used a single robust version of REMBO that automatically sets its GP’s length scale parameter using a variant of maximum likelihood.
  • 4.1 Bayesian Optimization in a Billion Dimensions.
  • The experiments employ a standard de = 2dimensional benchmark function for Bayesian optimization, embedded in a D-dimensional space.
  • The function whose optimum the authors seek is f (x1:D) = g, where g is the Branin function and where i and j are selected once using a random permutation.
  • To measure the performance of each optimization method, the authors used the optimality gap: the difference of the best function value it found and the optimal function value
  • Conclusion:

    This paper has shown that it is possible to use random embeddings in Bayesian optimization to optimize functions of high extrinsic dimensionality D, provided that they have low intrinsic dimensionality de.
  • The authors demonstrated that REMBO achieves excellent performance for optimizing the 47 discrete parameters of a popular mixed integer programming solver, thereby providing further evidence for the observation that, for many problems of great practical interest, the number of important dimensions appears to be much lower than their extrinsic dimensionality.
  • The success achieved in the examples presented in this paper is very encouraging
Tables
  • Table1: Optimality gap for de = 2-dimensional Branin function embedded in D = 25 dimensions, for REMBO variants using a total of 500 function evaluations. The variants differed in the internal dimensionality d and in the number of interleaved runs k (each such run was only allowed 500/k function evaluations). We show mean and standard deviations of the optimality gap achieved after 500 function evaluations
Download tables as Excel
Reference
  • [Azimi et al., 2012] J. Azimi, A. Jalali, and X. Zhang-Fern. Hybrid batch Bayesian optimization. In ICML, 2012.
    Google ScholarLocate open access versionFindings
  • [Bergstra and Bengio, 2012] J. Bergstra and Y. Bengio. Random search for hyper-parameter optimization. JMLR, 13:281–305, 2012.
    Google ScholarLocate open access versionFindings
  • [Bergstra et al., 2011] J. Bergstra, R. Bardenet, Y. Bengio, and B. Kegl. Algorithms for hyper-parameter optimization. In NIPS, pages 2546–2554, 2011.
    Google ScholarLocate open access versionFindings
  • [Bergstra et al., 2012] J. Bergstra, D. Yamins, and D. D. Cox. Making a science of model search. CoRR, abs/1209.5111, 2012.
    Findings
  • [Brochu et al., 2009] E. Brochu, V. M. Cora, and N. de Freitas. A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. Technical Report UBC TR-2009-23 and arXiv:1012.2599v1, Dept. of Computer Science, University of British Columbia, 2009.
    Findings
  • [Bull, 2011] A. D. Bull. Convergence rates of efficient global optimization algorithms. JMLR, 12:2879–2904, 2011.
    Google ScholarLocate open access versionFindings
  • [Carpentier and Munos, 2012] A. Carpentier and R. Munos. Bandit theory meets compressed sensing for high dimensional stochastic linear bandit. In AIStats, pages 190–198, 2012.
    Google ScholarLocate open access versionFindings
  • [Chen et al., 2012] B. Chen, R.M. Castro, and A. Krause. Joint optimization and variable selection of high-dimensional Gaussian processes. In ICML, 2012.
    Google ScholarLocate open access versionFindings
  • [de Freitas et al., 2012] N. de Freitas, A. Smola, and M. Zoghi. Exponential regret bounds for Gaussian process bandits with deterministic observations. In ICML, 2012.
    Google ScholarLocate open access versionFindings
  • [Gomes et al., 2008] C. P. Gomes, W.J. van Hoeve, and A. Sabharwal. Connections in networks: A hybrid approach. In CPAIOR, volume 5015, pages 303–307, 2008.
    Google ScholarLocate open access versionFindings
  • [Hansen and Ostermeier, 2001] N. Hansen and A. Ostermeier. Completely derandomized self-adaptation in evolution strategies. Evol. Comput., 9(2):159–195, 2001.
    Google ScholarLocate open access versionFindings
  • [Hoffman et al., 2011] M. Hoffman, E. Brochu, and N. de Freitas. Portfolio allocation for Bayesian optimization. In UAI, pages 327–336, 2011.
    Google ScholarLocate open access versionFindings
  • [Hoos, 2012] H. H. Hoos. Programming by optimization. Commun. ACM, 55(2):70–80, 2012.
    Google ScholarLocate open access versionFindings
  • [Hutter et al., 2010] F. Hutter, H. H. Hoos, and K. Leyton-Brown. Automated configuration of mixed integer programming solvers. In CPAIOR, pages 186–202, 2010.
    Google ScholarLocate open access versionFindings
  • [Hutter et al., 2011] F. Hutter, H. H. Hoos, and K. Leyton-Brown. Sequential model-based optimization for general algorithm configuration. In LION, pages 507–523, 2011.
    Google ScholarLocate open access versionFindings
  • [Hutter et al., 2012] F. Hutter, H. H. Hoos, and K. Leyton-Brown. Parallel algorithm configuration. In LION, pages 55–70, 2012.
    Google ScholarLocate open access versionFindings
  • [Hutter et al., 2013] F. Hutter, H. Hoos, and K. Leyton-Brown. Identifying key algorithm parameters and instance features using forward selection. In LION, 2013.
    Google ScholarFindings
  • [Hutter, 2009] F. Hutter. Automated Configuration of Algorithms for Solving Hard Computational Problems. PhD thesis, University of British Columbia, Vancouver, Canada, 2009.
    Google ScholarFindings
  • [Jones et al., 1993] David R Jones, C D Perttunen, and B E Stuckman. Lipschitzian optimization without the Lipschitz constant. J. of Optimization Theory and Applications, 79(1):157–181, 1993.
    Google ScholarLocate open access versionFindings
  • [Jones et al., 1998] D.R. Jones, M. Schonlau, and W.J. Welch. Efficient global optimization of expensive black-box functions. J. of Global optimization, 13(4):455–492, 1998.
    Google ScholarLocate open access versionFindings
  • [Jones, 2001] D.R. Jones. A taxonomy of global optimization methods based on response surfaces. J. of Global Optimization, 21(4):345–383, 2001.
    Google ScholarLocate open access versionFindings
  • [Lizotte et al., 2011] D. Lizotte, R. Greiner, and D. Schuurmans. An experimental methodology for response surface optimization methods. J. of Global Optimization, pages 1–38, 2011.
    Google ScholarLocate open access versionFindings
  • [Lizotte, 2008] D. Lizotte. Practical Bayesian Optimization. PhD thesis, University of Alberta, Canada, 2008.
    Google ScholarFindings
  • [Martinez–Cantin et al., 2009] R. Martinez–Cantin, N. de Freitas, E. Brochu, J. Castellanos, and A. Doucet. A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot. Autonomous Robots, 27(2):93–103, 2009.
    Google ScholarLocate open access versionFindings
  • [Mockus et al., 1999] J. Mockus, A. Mockus, and L. Mockus. Bayesian approach for randomization of heuristic algorithms of discrete programming. American Math. Society, 1999.
    Google ScholarLocate open access versionFindings
  • [Mockus, 1982] J. Mockus. The Bayesian approach to global optimization. In Systems Modeling and Optimization, volume 38, pages 473–481.
    Google ScholarLocate open access versionFindings
  • [Mockus, 1994] J. Mockus. Application of Bayesian approach to numerical methods of global and stochastic optimization. J. of Global Optimization, 4(4):347–365, 1994.
    Google ScholarLocate open access versionFindings
  • [Osborne et al., 2009] M. A. Osborne, R. Garnett, and S. J. Roberts. Gaussian processes for global optimisation. In LION, 2009.
    Google ScholarLocate open access versionFindings
  • [Rasmussen and Williams, 2006] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. The MIT Press, 2006.
    Google ScholarFindings
  • [Sankar et al., 2003] A. Sankar, D.A. Spielman, and S.H. Teng. Smoothed analysis of the condition numbers and growth factors of matrices. Arxiv preprint cs/0310022, 2003.
    Google ScholarFindings
  • [Snoek et al., 2012] J. Snoek, H. Larochelle, and R. P. Adams. Practical Bayesian optimization of machine learning algorithms. In NIPS, 2012.
    Google ScholarLocate open access versionFindings
  • [Srinivas et al., 2010] N. Srinivas, A. Krause, S. M. Kakade, and M. Seeger. Gaussian process optimization in the bandit setting: No regret and experimental design. In ICML, 2010.
    Google ScholarLocate open access versionFindings
  • [Vallati et al., 2011] M. Vallati, C. Fawcett, A. E. Gerevini, H. H. Hoos, and A. Saetti. Generating fast domain-optimized planners by automatically configuring a generic parameterised planner. In ICAPS-PAL, 2011.
    Google ScholarLocate open access versionFindings
  • [Wang and de Freitas, 2011] Z. Wang and N. de Freitas. Predictive adaptation of hybrid Monte Carlo with Bayesian parametric bandits. In NIPS Deep Learning and Unsupervised Feature Learning Workshop, 2011.
    Google ScholarLocate open access versionFindings
  • [Wang et al., 2013] Z. Wang, M. Zoghi, F. Hutter, D. Matheson, and N. de Freitas. Bayesian Optimization in a Billion Dimensions via Random Embeddings. ArXiv e-prints, January 2013.
    Google ScholarFindings
Your rating :
0

 

Best Paper
Best Paper of IJCAI, 2013
Tags
Comments