On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning

JMLR Workshop and Conference Proceedings, pp. 365-374, 2014.

Cited by: 53|Bibtex|Views33|Links
EI
Keywords:
good arm identificationprobability of improvementupper confidence boundsfrequentist counterpartfunction evaluationMore(11+)
Weibo:
We focused on a Bayesian treatment of the UGap algorithm, the same approach could conceivably be applied to other techniques such as UCBE

Abstract:

We address the problem of finding the maximizer of a nonlinear function that can only be evaluated, subject to noise, at a finite number of query locations. Further, we will assume that there is a constraint on the total number of permitted function evaluations. We introduce a Bayesian approach for this problem and show that it empiricall...More

Code:

Data:

0
Introduction
  • This paper draws connections between Bayesian optimization approaches and best arm identification in the bandit setting.
  • In order to attack the problem of Bayesian optimization from a bandit perspective the authors will consider a finite collection of arms A = {1, .
  • The authors will introduce a gap-based solution to the Bayesian optimization problem, which the authors call BayesGap. This approach builds on the work of Gabillon et al [2011, 2012], which the authors will refer to as UGap1 and o↵ers a principled way to incorporate correlation between di↵erent arms.
Highlights
  • This paper draws connections between Bayesian optimization approaches and best arm identification in the bandit setting
  • We introduce a Bayesian approach that meets the above design goals and show that it empiri-
  • The Bayesian approach places emphasis on detailed modelling, including the modelling of correlations among the arms. It can perform well in situations where the number of arms is much larger than the number of allowed function evaluation, whereas the frequentist counterpart is inapplicable
  • We proposed a Bayesian optimization method for best arm identification with a fixed budget
  • We focused on a Bayesian treatment of the UGap algorithm, the same approach could conceivably be applied to other techniques such as UCBE
Results
  • At the beginning of round t the authors will assume that the decision maker is equipped with high-probability upper and lower bounds Uk(t) and Lk(t) on the unknown mean μk for each arm.
  • While this approach can encompass more general bounds, for the Gaussian-arms setting that the authors consider in this work the authors can define these quantities in terms of the mean and standard deviation, i.e. μkt ± ˆkt.
  • Consider a K-armed Gaussian bandit problem, horizon T , and upper and lower bounds defined as above.
  • Comparison with UGap. The method provides a Bayesian version of the UGap algorithm which modifies the bounds used in this earlier algorithm’s arm selection step.
  • In order to evaluate di↵erent bandit and Bayesian optimization algorithms, the authors use each of the remaining 840 sensor signals as the true mean vector μ for independent runs of the experiment.
  • Note that using the model in this way enables them to evaluate the ground truth for each run, and estimate the actual probability that the policies return the best arm.
  • The authors benchmark the proposed algorithm (BayesGap) against the following methods: (1) UCBE: Introduced by Audibert et al [2010]; this is a variant of the classical UCB policy of Auer et al [2002] that replaces the log(t) exploration term of UCB with a constant of order log(T ) for known horizon T .
Conclusion
  • Note that techniques (1) and (2) above attack the problem of best arm identification and use bounds which encourage more aggressive exploration.
  • (Here the authors used ✏ = 0, but varying this quantity had little e↵ect on the performance of each algorithm.) By looking at the results, the authors quickly learn that techniques that model correlation perform better than the techniques designed for best arm identification, even when they are being evaluated in a best arm identification task.
  • The authors proposed a Bayesian optimization method for best arm identification with a fixed budget.
Summary
  • This paper draws connections between Bayesian optimization approaches and best arm identification in the bandit setting.
  • In order to attack the problem of Bayesian optimization from a bandit perspective the authors will consider a finite collection of arms A = {1, .
  • The authors will introduce a gap-based solution to the Bayesian optimization problem, which the authors call BayesGap. This approach builds on the work of Gabillon et al [2011, 2012], which the authors will refer to as UGap1 and o↵ers a principled way to incorporate correlation between di↵erent arms.
  • At the beginning of round t the authors will assume that the decision maker is equipped with high-probability upper and lower bounds Uk(t) and Lk(t) on the unknown mean μk for each arm.
  • While this approach can encompass more general bounds, for the Gaussian-arms setting that the authors consider in this work the authors can define these quantities in terms of the mean and standard deviation, i.e. μkt ± ˆkt.
  • Consider a K-armed Gaussian bandit problem, horizon T , and upper and lower bounds defined as above.
  • Comparison with UGap. The method provides a Bayesian version of the UGap algorithm which modifies the bounds used in this earlier algorithm’s arm selection step.
  • In order to evaluate di↵erent bandit and Bayesian optimization algorithms, the authors use each of the remaining 840 sensor signals as the true mean vector μ for independent runs of the experiment.
  • Note that using the model in this way enables them to evaluate the ground truth for each run, and estimate the actual probability that the policies return the best arm.
  • The authors benchmark the proposed algorithm (BayesGap) against the following methods: (1) UCBE: Introduced by Audibert et al [2010]; this is a variant of the classical UCB policy of Auer et al [2002] that replaces the log(t) exploration term of UCB with a constant of order log(T ) for known horizon T .
  • Note that techniques (1) and (2) above attack the problem of best arm identification and use bounds which encourage more aggressive exploration.
  • (Here the authors used ✏ = 0, but varying this quantity had little e↵ect on the performance of each algorithm.) By looking at the results, the authors quickly learn that techniques that model correlation perform better than the techniques designed for best arm identification, even when they are being evaluated in a best arm identification task.
  • The authors proposed a Bayesian optimization method for best arm identification with a fixed budget.
Related work
  • Bayesian optimization has enjoyed success in a broad range of optimization tasks; see the work of Brochu et al [2010b] for a broad overview. Recently, this approach has received a great deal of attention as a black-box technique for the optimization of hyperparameters [Snoek et al, 2012, Hutter et al, 2011, Wang et al, 2013b]. This type of optimization combines prior knowledge about the objective function with previous observations to estimate the posterior distribution over f . The posterior distribution, in turn, is used to construct an acquisition function that determines what the next query point at should be. Examples of acquisition functions include probability of improvement (PI), expected improvement (EI), Bayesian upper confidence bounds (UCB), and mixtures of these [Mockus, 1982, Jones, 2001, Srinivas et al, 2010, Ho↵man et al, 2011]. One of the key strengths underlying the use of Bayesian optimization is the ability to capture complicated correlation structures via the posterior distribution.
Funding
  • Addresses the problem of finding the maximizer of a nonlinear function that can only be evaluated, subject to noise, at a finite number of query locations
  • Introduces a Bayesian approach for this problem and show that it empirically outperforms both the existing frequentist counterpart and other Bayesian optimization methods
  • Introduces a Bayesian approach that meets the above design goals and show that it empiri-
Reference
  • S. Agrawal and N. Goyal. Thompson sampling for contextual bandits with linear payo↵s. In ICML, 2013.
    Google ScholarFindings
  • S. Arlot and A. Celisse. A survey of cross-validation procedures for model selection. Statistics Surveys, 4:40–79, 2010.
    Google ScholarLocate open access versionFindings
  • J.-Y. Audibert, S. Bubeck, and R. Munos. Best arm identification in multi-armed bandits. In CoLT, 2010.
    Google ScholarLocate open access versionFindings
  • P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2):235–256, 2002.
    Google ScholarLocate open access versionFindings
  • J. Azimi, A. Fern, and X. Fern. Budgeted optimization with concurrent stochastic-duration experiments. In NIPS, pages 1098–1106, 2011.
    Google ScholarLocate open access versionFindings
  • J. Bergstra, R. Bardenet, Y. Bengio, and B. Kegl. Algorithms for hyper-parameter optimization. In NIPS, pages 2546–2554, 2011.
    Google ScholarLocate open access versionFindings
  • E. Brochu, N. de Freitas, and A. Ghosh. Active preference learning with discrete choice data. In NIPS, pages 409– 416, 2007.
    Google ScholarLocate open access versionFindings
  • E. Brochu, T. Brochu, and N. de Freitas. A Bayesian interactive optimization approach to procedural animation design. In ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pages 103–112, 2010a.
    Google ScholarLocate open access versionFindings
  • E. Brochu, V. Cora, and N. de Freitas. A tutorial on Bayesian optimization of expensive cost functions. Technical Report arXiv:1012.2599, 2010b.
    Findings
  • S. Bubeck, R. Munos, and G. Stoltz. Pure exploration in multi-armed bandits problems. In International Conference on Algorithmic Learning Theory, 2009.
    Google ScholarLocate open access versionFindings
  • P. Burman. A comparative study of ordinary crossvalidation, v-fold cross-validation and the repeated learning-testing methods. Biometrika, 76(3):pp. 503– 514, 1989.
    Google ScholarLocate open access versionFindings
  • N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University Press, New York, 2006.
    Google ScholarFindings
  • O. Chapelle and L. Li. An empirical evaluation of Thompson sampling. In NIPS, 2012.
    Google ScholarLocate open access versionFindings
  • N. de Freitas, A. Smola, and M. Zoghi. Exponential Regret Bounds for Gaussian Process Bandits with Deterministic Observations. In ICML, 2012.
    Google ScholarLocate open access versionFindings
  • V. Gabillon, M. Ghavamzadeh, A. Lazaric, and S. Bubeck. Multi-bandit best arm identification. In NIPS, 2011.
    Google ScholarLocate open access versionFindings
  • V. Gabillon, M. Ghavamzadeh, and A. Lazaric. Best arm identification: A unified approach to fixed budget and fixed confidence. In NIPS, 2012.
    Google ScholarLocate open access versionFindings
  • F. Hamze, Z. Wang, and N. de Freitas. Self-avoiding random dynamics on integer complex systems. ACM Transactions on Modelling and Computer Simulation, 23(1): 9:1–9:25, 2013.
    Google ScholarLocate open access versionFindings
  • P. Hennig and C. Schuler. Entropy search for informatione cient global optimization. JMLR, 13:1809–1837, 2012.
    Google ScholarLocate open access versionFindings
  • M. W. Ho↵man, E. Brochu, and N. de Freitas. Portfolio allocation for Bayesian optimization. In UAI, pages 327– 336, 2011.
    Google ScholarLocate open access versionFindings
  • F. Hutter, H. H. Hoos, and K. Leyton-Brown. Sequential model-based optimization for general algorithm configuration. In Proceedings of LION-5, page 507523, 2011.
    Google ScholarLocate open access versionFindings
  • D. Jones. A taxonomy of global optimization methods based on response surfaces. J. of Global Optimization, 21(4):345–383, 2001.
    Google ScholarLocate open access versionFindings
  • E. Kaufmann and S. Kalyanakrishnan. Information complexity in bandit subset selection. In Conference on Learning Theory, pages 228–251, 2013.
    Google ScholarLocate open access versionFindings
  • E. Kaufmann, O. Cappe, and A. Garivier. On Bayesian upper conf. bounds for bandit problems. In AIStats, 2012a.
    Google ScholarLocate open access versionFindings
  • E. Kaufmann, N. Korda, and R. Munos. Thompson sampling: an asymptotically optimal finite-time analysis. In International Conference on Algorithmic Learning Theory, 2012b.
    Google ScholarLocate open access versionFindings
  • R. Kohavi, R. Longbotham, D. Sommerfield, and R. Henne. Controlled experiments on the web: survey and practical guide. Data Mining and Knowledge Discovery, 18:140–181, 2009.
    Google ScholarLocate open access versionFindings
  • H. Kueck, N. de Freitas, and A. Doucet. SMC samplers for Bayesian optimal nonlinear design. In IEEE Nonlinear Statistical Signal Processing Workshop, pages 99– 102, 2006.
    Google ScholarLocate open access versionFindings
  • H. Kueck, M. Ho↵man, A. Doucet, and N. de Freitas. Inference and learning for active sensing, experimental design and control. In H. Araujo, A. Mendonca, A. Pinho, and M. Torres, editors, Pattern Recognition and Image Analysis, volume 5524, pages 1–10. Springer Berlin Heidelberg, 2009.
    Google ScholarLocate open access versionFindings
  • D. J. Lizotte, R. Greiner, and D. Schuurmans. An experimental methodology for response surface optimization methods. Journal of Global Optimization, 53(4):699– 736, 2012.
    Google ScholarLocate open access versionFindings
  • N. Mahendran, Z. Wang, F. Hamze, and N. de Freitas. Adaptive MCMC with Bayesian optimization. Journal of Machine Learning Research - Proceedings Track, 22: 751–760, 2012.
    Google ScholarLocate open access versionFindings
  • O. Maron and A. W. Moore. Hoe↵ding races: Accelerating model selection search for classification and function approximation. In NIPS, pages 59–66, 1994.
    Google ScholarLocate open access versionFindings
  • R. Martinez-Cantin, N. de Freitas, A. Doucet, and J. A. Castellanos. Active policy learning for robot planning and exploration under uncertainty. 2007.
    Google ScholarFindings
  • R. Munos. Optimistic optimization of a deterministic function without the knowledge of its smoothness. In NIPS, pages 783–791, 2011.
    Google ScholarLocate open access versionFindings
  • K. P. Murphy. Machine learning: A probabilistic perspective. Cambridge, MA, 2012.
    Google ScholarFindings
  • S. Scott. A modern Bayesian look at the multi-armed bandit. Applied Stochastic Models in Business and Industry, 26(6), 2010.
    Google ScholarLocate open access versionFindings
  • J. Snoek, H. Larochelle, and R. P. Adams. Opportunity cost in Bayesian optimization. In Neural Information Processing Systems Workshop on Bayesian Optimization, 2011.
    Google ScholarLocate open access versionFindings
  • J. Snoek, H. Larochelle, and R. Adams. Practical Bayesian optimization of machine learning algorithms. In NIPS, pages 2960–2968, 2012.
    Google ScholarLocate open access versionFindings
  • N. Srinivas, A. Krause, S. M. Kakade, and M. Seeger. Gaussian process optimization in the bandit setting: No regret and experimental design. In ICML, 2010.
    Google ScholarLocate open access versionFindings
  • K. Swersky, J. Snoek, and R. P. Adams. Multi-task Bayesian optimization. In Neural Information Processing Systems, 2013.
    Google ScholarLocate open access versionFindings
  • C. Thornton, F. Hutter, H. H. Hoos, and K. Leyton-Brown. Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms. In KDD, pages 847–855, 2013.
    Google ScholarLocate open access versionFindings
  • M. Valko, A. Carpentier, and R. Munos. Stochastic simultaneous optimistic optimization. In ICML, 2013.
    Google ScholarFindings
  • J. Villemonteix, E. Vazquez, and E. Walter. An informational approach to the global optimization of expensiveto-evaluate functions. Journal of Global Optimization, 44(4):509–534, 2009.
    Google ScholarLocate open access versionFindings
  • Z. Wang, S. Mohamed, and N. de Freitas. Adaptive Hamiltonian and Riemann manifold Monte Carlo samplers. In ICML, 2013a.
    Google ScholarLocate open access versionFindings
  • Z. Wang, M. Zoghi, D. Matheson, F. Hutter, and N. de Freitas. Bayesian optimization in high dimensions via random embeddings. In IJCAI, 2013b.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments