On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning
JMLR Workshop and Conference Proceedings, pp. 365-374, 2014.
EI
Keywords:
good arm identificationprobability of improvementupper confidence boundsfrequentist counterpartfunction evaluationMore(11+)
Weibo:
Abstract:
We address the problem of finding the maximizer of a nonlinear function that can only be evaluated, subject to noise, at a finite number of query locations. Further, we will assume that there is a constraint on the total number of permitted function evaluations. We introduce a Bayesian approach for this problem and show that it empiricall...More
Code:
Data:
Introduction
- This paper draws connections between Bayesian optimization approaches and best arm identification in the bandit setting.
- In order to attack the problem of Bayesian optimization from a bandit perspective the authors will consider a finite collection of arms A = {1, .
- The authors will introduce a gap-based solution to the Bayesian optimization problem, which the authors call BayesGap. This approach builds on the work of Gabillon et al [2011, 2012], which the authors will refer to as UGap1 and o↵ers a principled way to incorporate correlation between di↵erent arms.
Highlights
- This paper draws connections between Bayesian optimization approaches and best arm identification in the bandit setting
- We introduce a Bayesian approach that meets the above design goals and show that it empiri-
- The Bayesian approach places emphasis on detailed modelling, including the modelling of correlations among the arms. It can perform well in situations where the number of arms is much larger than the number of allowed function evaluation, whereas the frequentist counterpart is inapplicable
- We proposed a Bayesian optimization method for best arm identification with a fixed budget
- We focused on a Bayesian treatment of the UGap algorithm, the same approach could conceivably be applied to other techniques such as UCBE
Results
- At the beginning of round t the authors will assume that the decision maker is equipped with high-probability upper and lower bounds Uk(t) and Lk(t) on the unknown mean μk for each arm.
- While this approach can encompass more general bounds, for the Gaussian-arms setting that the authors consider in this work the authors can define these quantities in terms of the mean and standard deviation, i.e. μkt ± ˆkt.
- Consider a K-armed Gaussian bandit problem, horizon T , and upper and lower bounds defined as above.
- Comparison with UGap. The method provides a Bayesian version of the UGap algorithm which modifies the bounds used in this earlier algorithm’s arm selection step.
- In order to evaluate di↵erent bandit and Bayesian optimization algorithms, the authors use each of the remaining 840 sensor signals as the true mean vector μ for independent runs of the experiment.
- Note that using the model in this way enables them to evaluate the ground truth for each run, and estimate the actual probability that the policies return the best arm.
- The authors benchmark the proposed algorithm (BayesGap) against the following methods: (1) UCBE: Introduced by Audibert et al [2010]; this is a variant of the classical UCB policy of Auer et al [2002] that replaces the log(t) exploration term of UCB with a constant of order log(T ) for known horizon T .
Conclusion
- Note that techniques (1) and (2) above attack the problem of best arm identification and use bounds which encourage more aggressive exploration.
- (Here the authors used ✏ = 0, but varying this quantity had little e↵ect on the performance of each algorithm.) By looking at the results, the authors quickly learn that techniques that model correlation perform better than the techniques designed for best arm identification, even when they are being evaluated in a best arm identification task.
- The authors proposed a Bayesian optimization method for best arm identification with a fixed budget.
Summary
- This paper draws connections between Bayesian optimization approaches and best arm identification in the bandit setting.
- In order to attack the problem of Bayesian optimization from a bandit perspective the authors will consider a finite collection of arms A = {1, .
- The authors will introduce a gap-based solution to the Bayesian optimization problem, which the authors call BayesGap. This approach builds on the work of Gabillon et al [2011, 2012], which the authors will refer to as UGap1 and o↵ers a principled way to incorporate correlation between di↵erent arms.
- At the beginning of round t the authors will assume that the decision maker is equipped with high-probability upper and lower bounds Uk(t) and Lk(t) on the unknown mean μk for each arm.
- While this approach can encompass more general bounds, for the Gaussian-arms setting that the authors consider in this work the authors can define these quantities in terms of the mean and standard deviation, i.e. μkt ± ˆkt.
- Consider a K-armed Gaussian bandit problem, horizon T , and upper and lower bounds defined as above.
- Comparison with UGap. The method provides a Bayesian version of the UGap algorithm which modifies the bounds used in this earlier algorithm’s arm selection step.
- In order to evaluate di↵erent bandit and Bayesian optimization algorithms, the authors use each of the remaining 840 sensor signals as the true mean vector μ for independent runs of the experiment.
- Note that using the model in this way enables them to evaluate the ground truth for each run, and estimate the actual probability that the policies return the best arm.
- The authors benchmark the proposed algorithm (BayesGap) against the following methods: (1) UCBE: Introduced by Audibert et al [2010]; this is a variant of the classical UCB policy of Auer et al [2002] that replaces the log(t) exploration term of UCB with a constant of order log(T ) for known horizon T .
- Note that techniques (1) and (2) above attack the problem of best arm identification and use bounds which encourage more aggressive exploration.
- (Here the authors used ✏ = 0, but varying this quantity had little e↵ect on the performance of each algorithm.) By looking at the results, the authors quickly learn that techniques that model correlation perform better than the techniques designed for best arm identification, even when they are being evaluated in a best arm identification task.
- The authors proposed a Bayesian optimization method for best arm identification with a fixed budget.
Related work
- Bayesian optimization has enjoyed success in a broad range of optimization tasks; see the work of Brochu et al [2010b] for a broad overview. Recently, this approach has received a great deal of attention as a black-box technique for the optimization of hyperparameters [Snoek et al, 2012, Hutter et al, 2011, Wang et al, 2013b]. This type of optimization combines prior knowledge about the objective function with previous observations to estimate the posterior distribution over f . The posterior distribution, in turn, is used to construct an acquisition function that determines what the next query point at should be. Examples of acquisition functions include probability of improvement (PI), expected improvement (EI), Bayesian upper confidence bounds (UCB), and mixtures of these [Mockus, 1982, Jones, 2001, Srinivas et al, 2010, Ho↵man et al, 2011]. One of the key strengths underlying the use of Bayesian optimization is the ability to capture complicated correlation structures via the posterior distribution.
Funding
- Addresses the problem of finding the maximizer of a nonlinear function that can only be evaluated, subject to noise, at a finite number of query locations
- Introduces a Bayesian approach for this problem and show that it empirically outperforms both the existing frequentist counterpart and other Bayesian optimization methods
- Introduces a Bayesian approach that meets the above design goals and show that it empiri-
Reference
- S. Agrawal and N. Goyal. Thompson sampling for contextual bandits with linear payo↵s. In ICML, 2013.
- S. Arlot and A. Celisse. A survey of cross-validation procedures for model selection. Statistics Surveys, 4:40–79, 2010.
- J.-Y. Audibert, S. Bubeck, and R. Munos. Best arm identification in multi-armed bandits. In CoLT, 2010.
- P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2):235–256, 2002.
- J. Azimi, A. Fern, and X. Fern. Budgeted optimization with concurrent stochastic-duration experiments. In NIPS, pages 1098–1106, 2011.
- J. Bergstra, R. Bardenet, Y. Bengio, and B. Kegl. Algorithms for hyper-parameter optimization. In NIPS, pages 2546–2554, 2011.
- E. Brochu, N. de Freitas, and A. Ghosh. Active preference learning with discrete choice data. In NIPS, pages 409– 416, 2007.
- E. Brochu, T. Brochu, and N. de Freitas. A Bayesian interactive optimization approach to procedural animation design. In ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pages 103–112, 2010a.
- E. Brochu, V. Cora, and N. de Freitas. A tutorial on Bayesian optimization of expensive cost functions. Technical Report arXiv:1012.2599, 2010b.
- S. Bubeck, R. Munos, and G. Stoltz. Pure exploration in multi-armed bandits problems. In International Conference on Algorithmic Learning Theory, 2009.
- P. Burman. A comparative study of ordinary crossvalidation, v-fold cross-validation and the repeated learning-testing methods. Biometrika, 76(3):pp. 503– 514, 1989.
- N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University Press, New York, 2006.
- O. Chapelle and L. Li. An empirical evaluation of Thompson sampling. In NIPS, 2012.
- N. de Freitas, A. Smola, and M. Zoghi. Exponential Regret Bounds for Gaussian Process Bandits with Deterministic Observations. In ICML, 2012.
- V. Gabillon, M. Ghavamzadeh, A. Lazaric, and S. Bubeck. Multi-bandit best arm identification. In NIPS, 2011.
- V. Gabillon, M. Ghavamzadeh, and A. Lazaric. Best arm identification: A unified approach to fixed budget and fixed confidence. In NIPS, 2012.
- F. Hamze, Z. Wang, and N. de Freitas. Self-avoiding random dynamics on integer complex systems. ACM Transactions on Modelling and Computer Simulation, 23(1): 9:1–9:25, 2013.
- P. Hennig and C. Schuler. Entropy search for informatione cient global optimization. JMLR, 13:1809–1837, 2012.
- M. W. Ho↵man, E. Brochu, and N. de Freitas. Portfolio allocation for Bayesian optimization. In UAI, pages 327– 336, 2011.
- F. Hutter, H. H. Hoos, and K. Leyton-Brown. Sequential model-based optimization for general algorithm configuration. In Proceedings of LION-5, page 507523, 2011.
- D. Jones. A taxonomy of global optimization methods based on response surfaces. J. of Global Optimization, 21(4):345–383, 2001.
- E. Kaufmann and S. Kalyanakrishnan. Information complexity in bandit subset selection. In Conference on Learning Theory, pages 228–251, 2013.
- E. Kaufmann, O. Cappe, and A. Garivier. On Bayesian upper conf. bounds for bandit problems. In AIStats, 2012a.
- E. Kaufmann, N. Korda, and R. Munos. Thompson sampling: an asymptotically optimal finite-time analysis. In International Conference on Algorithmic Learning Theory, 2012b.
- R. Kohavi, R. Longbotham, D. Sommerfield, and R. Henne. Controlled experiments on the web: survey and practical guide. Data Mining and Knowledge Discovery, 18:140–181, 2009.
- H. Kueck, N. de Freitas, and A. Doucet. SMC samplers for Bayesian optimal nonlinear design. In IEEE Nonlinear Statistical Signal Processing Workshop, pages 99– 102, 2006.
- H. Kueck, M. Ho↵man, A. Doucet, and N. de Freitas. Inference and learning for active sensing, experimental design and control. In H. Araujo, A. Mendonca, A. Pinho, and M. Torres, editors, Pattern Recognition and Image Analysis, volume 5524, pages 1–10. Springer Berlin Heidelberg, 2009.
- D. J. Lizotte, R. Greiner, and D. Schuurmans. An experimental methodology for response surface optimization methods. Journal of Global Optimization, 53(4):699– 736, 2012.
- N. Mahendran, Z. Wang, F. Hamze, and N. de Freitas. Adaptive MCMC with Bayesian optimization. Journal of Machine Learning Research - Proceedings Track, 22: 751–760, 2012.
- O. Maron and A. W. Moore. Hoe↵ding races: Accelerating model selection search for classification and function approximation. In NIPS, pages 59–66, 1994.
- R. Martinez-Cantin, N. de Freitas, A. Doucet, and J. A. Castellanos. Active policy learning for robot planning and exploration under uncertainty. 2007.
- R. Munos. Optimistic optimization of a deterministic function without the knowledge of its smoothness. In NIPS, pages 783–791, 2011.
- K. P. Murphy. Machine learning: A probabilistic perspective. Cambridge, MA, 2012.
- S. Scott. A modern Bayesian look at the multi-armed bandit. Applied Stochastic Models in Business and Industry, 26(6), 2010.
- J. Snoek, H. Larochelle, and R. P. Adams. Opportunity cost in Bayesian optimization. In Neural Information Processing Systems Workshop on Bayesian Optimization, 2011.
- J. Snoek, H. Larochelle, and R. Adams. Practical Bayesian optimization of machine learning algorithms. In NIPS, pages 2960–2968, 2012.
- N. Srinivas, A. Krause, S. M. Kakade, and M. Seeger. Gaussian process optimization in the bandit setting: No regret and experimental design. In ICML, 2010.
- K. Swersky, J. Snoek, and R. P. Adams. Multi-task Bayesian optimization. In Neural Information Processing Systems, 2013.
- C. Thornton, F. Hutter, H. H. Hoos, and K. Leyton-Brown. Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms. In KDD, pages 847–855, 2013.
- M. Valko, A. Carpentier, and R. Munos. Stochastic simultaneous optimistic optimization. In ICML, 2013.
- J. Villemonteix, E. Vazquez, and E. Walter. An informational approach to the global optimization of expensiveto-evaluate functions. Journal of Global Optimization, 44(4):509–534, 2009.
- Z. Wang, S. Mohamed, and N. de Freitas. Adaptive Hamiltonian and Riemann manifold Monte Carlo samplers. In ICML, 2013a.
- Z. Wang, M. Zoghi, D. Matheson, F. Hutter, and N. de Freitas. Bayesian optimization in high dimensions via random embeddings. In IJCAI, 2013b.
Tags
Comments