AI helps you reading Science
AI Insight
AI extracts a summary of this paper
Weibo:
High-Dimensional Contextual Policy Search with Unknown Context Rewards using Bayesian Optimization
NIPS 2020, (2020): 22032-22044
EI
Keywords
Abstract
Contextual policies are used in many settings to customize system parameters and actions to the specifics of a particular setting. In some real-world settings, such as randomized controlled trials or A/B tests, it may not be possible to measure policy outcomes at the level of context—we observe only aggregate rewards across a distribution...More
Code:
Data:
Introduction
- Contextual policies are used in a wide range of applications, such as robotics [22, 30] and computing platforms [9].
- The optimal policy for a particular ABR controller may depend on the network—for instance, a stream with large fluctuations in bandwidth will benefit from different ABR parameters than a stream with stable bandwidth
- This motivates the use of a contextual policy where ABR parameters are personalized by context variables such as country or network type (2G, 3G, 4G, etc.).
- Another set of methods for high-dimensional BO have assumed low-dimensional linear [44, 6, 36, 7, 34, 25] or nonlinear [15, 27, 32] structure to the problem
Highlights
- Contextual policies are used in a wide range of applications, such as robotics [22, 30] and computing platforms [9]
- (2) We develop new Gaussian process (GP) models that take advantage of the problem structure to significantly improve over existing Bayesian optimization (BO) approaches
- (3) We provide a thorough simulation study that shows how the models scale with factors such as the number of contexts and the population distribution of contexts, considering both aggregate rewards and fairness
- (4) We introduce a new real-world problem for contextual policy optimization (CPO), optimizing a contextual adaptive bitrate (ABR) policy, and show that our models perform best relative to a wide range of alternative approaches
- We develop two kernels that allow for effective BO in this space by taking advantage of the particular structure of the aggregated CPO problem
- Degraded while the other methods found significantly better policies across the full range of
- The latent context embedding additive (LCE-A) model makes it possible to optimize in high-dimensional policy spaces by leveraging plausible inductive biases for contextual policies
Results
- Degraded while the other methods found significantly better policies across the full range of.
Conclusion
- The authors have shown that it is possible to deploy and optimize contextual policies even when rewards cannot be measured at the level of context.
- The LCE-A model makes it possible to optimize in high-dimensional policy spaces by leveraging plausible inductive biases for contextual policies.
- This improves top-level aggregate rewards relative to non-contextual policies, and improves the fairness of the policy by improving outcomes across all contexts.
- The authors hope that future work can consider leveraging pre-trained, unsupervised representations of contexts to reduce the burden of learning good embeddings of contexts from scratch, which would further enable the method to scale to a very large number of contexts
Funding
- Rapidly degraded while the other methods found significantly better policies across the full range of
Reference
- Introducing tensorflow feature columns. https://developers.googleblog.com/2017/11/introducing-tensorflow-feature-columns.html. Accessed:2020-06-05.
- Mohan R Akella, Rajan Batta, Moises Sudit, Peter Rogerson, and Alan Blatt. Cellular network configuration with co-channel and adjacent-channel interference constraints. Computers & Operations Research, 35(12):3738–3757, 2008.
- Mauricio A. Álvarez, Lorenzo Rosasco, and Neil D. Lawrence. Kernels for vector-valued functions: A review. Foundations and Trends in Machine Learning, 4(3):195–266, 2012.
- Raul Astudillo and Peter I Frazier. Bayesian optimization of composite functions. arXiv preprint arXiv:1906.01537, 2019.
- Maximilian Balandat, Brian Karrer, Daniel R. Jiang, Samuel Daulton, Benjamin Letham, Andrew Gordon Wilson, and Eytan Bakshy. BoTorch: A framework for efficient Monte-Carlo Bayesian optimization. In Advances in Neural Information Processing Systems 33, NeurIPS, 2020.
- Mickaël Binois, David Ginsbourger, and Olivier Roustant. A warped kernel improving robustness in Bayesian optimization via random embeddings. In Proceedings of the International Conference on Learning and Intelligent Optimization, LION, pages 281–286, 2015.
- Mickaël Binois, David Ginsbourger, and Olivier Roustant. On the choice of the low-dimensional domain for global optimization via random embeddings. Journal of Global Optimization, 76(1):69–90, 2020.
- Edwin V. Bonilla, Kian Ming A. Chai, and Christopher K. I. Williams. Multi-task Gaussian process prediction. In Advances in Neural Information Processing Systems 20, NIPS, pages 153–160, 2007.
- Ian Char, Youngseog Chung, Willie Neiswanger, Kirthevasan Kandasamy, Andrew Oakleigh Nelson, Mark Boyer, Egemen Kolemen, and Jeff Schneider. Offline contextual Bayesian optimization. In Advances in Neural Information Processing Systems 32, NeurIPS, pages 4627–4638, 2019.
- Sam Corbett-Davies and Sharad Goel. The measure and mismeasure of fairness: A critical review of fair machine learning. arXiv preprint arXiv:1808.00023, 2018.
- Jeffrey Dean and Luiz André Barroso. The tail at scale. Communications of the ACM, 56(2):74– 80, February 2013.
- David K. Duvenaud, Hannes Nickisch, and Carl E. Rasmussen. Additive Gaussian processes. In Advances in Neural Information Processing Systems 24, NIPS, pages 226–234, 2011.
- David Eriksson, Kun Dong, Eric Lee, David Bindel, and Andrew G. Wilson. Scaling Gaussian process regression with derivatives. In Advances in Neural Information Processing Systems 31, NIPS, pages 6867–6877, 2018.
- Jacob Gardner, Chuan Guo, Kilian Q. Weinberger, Roman Garnett, and Roger Grosse. Discovering and exploiting additive structure for Bayesian optimization. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS, pages 1311–1319, 2017.
- Rafael Gómez-Bombarelli, Jennifer N. Wei, David Duvenaud, José Miguel Hernández-Lobato, Benjamín Sánchez-Lengeling, Dennis Sheberla, Jorge Aguilera-Iparraguirre, Timothy D. Hirzel, Ryan P. Adams, and Alán Aspuru-Guzik. Automatic chemical design using a data-driven continuous representation of molecules. ACS Central Science, 4(2):268–276, 2018.
- Cheng Guo and Felix Berkhahn. Entity embeddings of categorical variables. arXiv preprint arXiv:1604.06737, 2016.
- Nikolaus Hansen, Sibylle D. Müller, and Petros Koumoutsakos. Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (cma-es). Evolutionary computation, 11(1):1–18, 2003.
- Kohei Hayashi, Takashi Takenouchi, Ryota Tomioka, and Hisashi Kashima. Self-measuring similarity for multi-task Gaussian process. In Proceedings of ICML Workshop on Unsupervised and Transfer Learning, pages 145–153, 2012.
- Donald R. Jones, Matthias Schonlau, and William J. Welch. Efficient global optimization of expensive black-box functions. Journal of Global Optimization, 13(4):455–492, 1998.
- Kirthevasan Kandasamy, Jeff Schneider, and Barnabás Póczos. High dimensional Bayesian optimisation and bandits via additive models. In International Conference on Machine Learning, ICML, pages 295–304, 2015.
- Andreas Krause and Cheng S. Ong. Contextual Gaussian process bandit optimization. In Advances in Neural Information Processing Systems 24, NIPS, pages 2447–2455, 2011.
- Andras Gabor Kupcsik, Marc Peter Deisenroth, Jan Peters, and Gerhard Neumann. Dataefficient generalization of robot skills with contextual policy search. In Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, AAAI, pages 1401–1407, 2013.
- Jeffrey Scott Lehman. Sequential design of computer experiments for robust parameter design. PhD thesis, The Ohio State University, 2002.
- Benjamin Letham and Eytan Bakshy. Bayesian optimization for policy search via online-offline experimentation. Journal of Machine Learning Research, 20(145):1–30, 2019.
- Benjamin Letham, Roberto Calandra, Akshara Rai, and Eytan Bakshy. Re-examining linear embeddings for high-dimensional Bayesian optimization. In Advances in Neural Information Processing Systems 33, NeurIPS, 2020.
- Lihong Li, Wei Chu, John Langford, and Robert E Schapire. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th International Conference on World Wide Web, WWW, pages 661–670, 2010.
- Xiaoyu Lu, Javier González, Zhenwen Dai, and Neil Lawrence. Structured variationally auto-encoded optimization. In Proceedings of the 35th International Conference on Machine Learning, ICML, pages 3267–3275, 2018.
- Hongzi Mao, Shannon Chen, Drew Dimmery, Shaun Singh, Drew Blaisdell, Yuandong Tian, Mohammad Alizadeh, and Eytan Bakshy. Real-world video adaptation with reinforcement learning. arXiv preprint arXiv:2008.12858, 2020.
- Hongzi Mao, Parimarjan Negi, Akshay Narayan, Hanrui Wang, Jiacheng Yang, Haonan Wang, Ryan Marcus, Ravichandra Addanki, Mehrdad Khani, Songtao He, Vikram Nathan, Frank Cangialosi, Shaileshh Bojja Venkatakrishnan, Wei-Hung Weng, Song Han, Tim Kraska, and Mohammad Alizadeh. Park: An open platform for learning-augmented computer systems. In Advances in Neural Information Processing Systems 32, NeurIPS, pages 2490–2502, 2019.
- Jan Hendrik Metzen, Alexander Fabisch, and Jonas Hansen. Bayesian optimization for contextual policy search. In Proceedings of the Second Machine Learning in Planning and Control of Robot Motion Workshop, IROS Workshop, MLPC, 2015.
- Jacob M. Montgomery, Brendan Nyhan, and Michelle Torres. How conditioning on posttreatment variables can ruin your experiment and what to do about it. American Journal of Political Science, 62(3):760–775, 2018.
- Riccardo Moriconi, K. S. Sesh Kumar, and Marc P. Deisenroth. High-dimensional Bayesian optimization with manifold Gaussian processes. arXiv preprint arXiv:1902.10675, 2019.
- Mojmír Mutný and Andreas Krause. Efficient high dimensional Bayesian optimization with additivity and quadrature Fourier features. In Advances in Neural Information Processing Systems 31, NIPS, pages 9005–9016, 2018.
- Amin Nayebi, Alexander Munteanu, and Matthias Poloczek. A framework for Bayesian optimization in embedded subspaces. In Proceedings of the 36th International Conference on Machine Learning, ICML, pages 4752–4761, 2019.
- Art B Owen. Scrambling Sobol’and Niederreiter–Xing points. Journal of Complexity, 14(4):466– 489, 1998.
- Hong Qian, Yi-Qi Hu, and Yang Yu. Derivative-free optimization of high-dimensional nonconvex functions by sequential random embeddings. In Proceedings of the 25th International Joint Conference on Artificial Intelligence, IJCAI, pages 1946–1952, 2016.
- Peter Z. G. Qian, Huaiqing Wu, and C. F. Jeff Wu. Gaussian process models for computer experiments with qualitative and quantitative factors. Technometrics, 50(3):383–396, 2008.
- Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine Learning. The MIT Press, Cambridge, Massachusetts, 2006.
- Paul Rolland, Jonathan Scarlett, Ilija Bogunovic, and Volkan Cevher. High-dimensional Bayesian optimization via additive models with overlapping groups. In Proceedings of the 21st International Conference on Artificial Intelligence and Statistics, AISTATS, pages 298–307, 2018.
- Kevin Swersky, Jasper Snoek, and Ryan P. Adams. Multi-task Bayesian optimization. In Advances in Neural Information Processing Systems 26, NIPS, pages 2004–2012, 2013.
- Matthew Tesch, Jeff Schneider, and Howie Choset. Adapting control policies for expensive systems to changing environments. In IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, pages 357–364, 2011.
- Zi Wang, Clement Gehring, Pushmeet Kohli, and Stefanie Jegelka. Batched large-scale Bayesian optimization in high-dimensional spaces. In Proceedings of the 21st International Conference on Artificial Intelligence and Statistics, AISTATS, 2018.
- Zi Wang, Chengtao Li, Stefanie Jegelka, and Pushmeet Kohli. Batched high-dimensional Bayesian optimization via structural kernel learning. In Proceedings of the 34th International Conference on Machine Learning, ICML, pages 3656–3664, 2017.
- Ziyu Wang, Frank Hutter, Masrour Zoghi, David Matheson, and Nando de Feitas. Bayesian optimization in a billion dimensions via random embeddings. Journal of Artificial Intelligence Research, 55:361–387, 2016.
- Brian J. Williams, Thomas J. Santner, and William I. Notz. Sequential design of computer experiments to minimize integrated response functions. Statistica Sinica, 10(4):1133–1152, 2008.
- Xiaoqi Yin, Abhishek Jindal, Vyas Sekar, and Bruno Sinopoli. A control-theoretic approach for dynamic adaptive video streaming over HTTP. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, SIGCOMM, pages 325–338, 2015.
- Yichi Zhang, Daniel W. Apley, and Wei Chen. Bayesian optimization for materials design with mixed quantitative and qualitative variables. Scientific Reports, 10(4924), 2020.
Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn