# Model Selection for Production System via Automated Online Experiments

NIPS 2020, 2020.

EI

Keywords:

randomized experimentautomated online experimentationupper confidence boundml modelmodel uncertaintyMore(24+)

Weibo:

Abstract:

A challenge that machine learning practitioners in the industry face is the task of selecting the best model to deploy in production. As a model is often an intermediate component of a production system, online controlled experiments such as A/B tests yield the most reliable estimation of the effectiveness of the whole system, but can onl...More

Code:

Data:

Introduction

- Evaluating the effect of individual changes to machine learning (ML) systems such as choice of algorithms, features, etc., is the key to growth in many internet services and industrial applications.
- Classical model selection paradigms such as cross-validation consider ML models in isolation and focus on selecting the model with the best predictive power on unseen data.
- This approach does not work well for modern industrial ML systems, as such a system usually consists of many individual components and a ML model is only one of them.

Highlights

- Evaluating the effect of individual changes to machine learning (ML) systems such as choice of algorithms, features, etc., is the key to growth in many internet services and industrial applications
- We propose a new framework of model selection for production system, where the best model is selected via deploying a sequence of models online
- The model selection for production system does not fit into the classical model selection paradigm
- We propose a new approach by taking data collection into the model selection process and selecting the best model via iterative online experiments
- The model to deploy at each iteration is picked by balancing the predicted accumulative metric and the uncertainty of the prediction due to limited data
- With simulated experiments from real data, we show that automated online experimentation (AOE) performs significantly better than all the baselines in terms of identifying the best model and estimating the accumulative metric

Results

- With simulated experiments from real data, the authors show that AOE performs significantly better than all the baselines in terms of identifying the best model and estimating the accumulative metric.

Conclusion

- The model selection for production system does not fit into the classical model selection paradigm.
- The authors propose a new approach by taking data collection into the model selection process and selecting the best model via iterative online experiments.
- It allows selection from a much larger pool of candidates than using A/B testing and gives more accurate selection than off-policy evaluation by actively reducing selection bias.
- With simulated experiments from real data, the authors show that AOE performs significantly better than all the baselines in terms of identifying the best model and estimating the accumulative metric

Summary

## Introduction:

Evaluating the effect of individual changes to machine learning (ML) systems such as choice of algorithms, features, etc., is the key to growth in many internet services and industrial applications.- Classical model selection paradigms such as cross-validation consider ML models in isolation and focus on selecting the model with the best predictive power on unseen data.
- This approach does not work well for modern industrial ML systems, as such a system usually consists of many individual components and a ML model is only one of them.
## Results:

With simulated experiments from real data, the authors show that AOE performs significantly better than all the baselines in terms of identifying the best model and estimating the accumulative metric.## Conclusion:

The model selection for production system does not fit into the classical model selection paradigm.- The authors propose a new approach by taking data collection into the model selection process and selecting the best model via iterative online experiments.
- It allows selection from a much larger pool of candidates than using A/B testing and gives more accurate selection than off-policy evaluation by actively reducing selection bias.
- With simulated experiments from real data, the authors show that AOE performs significantly better than all the baselines in terms of identifying the best model and estimating the accumulative metric

Related work

- Model selection [18] is a classical topic in ML. The standard paradigm of model selection considers a model in insolation and aims at selecting a model that has the best predictive power for unseen data based on an offline dataset. Common techniques such as cross-validation, bootstrapping, Akaike information criterion [AIC, 19] and Bayesian information criterion [BIC, 20] have been widely used

Algorithm 1: model selection with automated online experiments (AOE) Result: Return the ML system with the highest accumulative metric Collect the initial data D0; while Online experiment budget is not over do

Infer p(f |A, X, Dt−1) with VI on surrogate model ; Identify Mt = arg maxMi∈M α(Mi); Deploy Mt and construct Dt by augmenting the collected data into Dt−1 ; end for scoring a model’s predictive power based on a given dataset. As scoring all the candidate models does not scale for complex problems, many recent works focus on tackling the problem of searching a large continuous and/or combinatorial space of model configurations, ranging from hyper-parameter optimization [HPO, 10, 21], automatic statistician [22, 23, 24, 25] to neural network architecture search [NAS, 26]. A more recent work [27] jointly considers the scoring and searching problem for computational efficiency. Online model selection [28, 29] is an extension of the standard model selection paradigm. It still treats a model in isolation but considers the online learning scenario, in which data arrive sequentially and the models are continuously updated. This is different to MSPS, which views a model in the context of a bigger system and actively controls the data collection.

Funding

- With simulated experiments from real data, we show that AOE performs significantly better than all the baselines in terms of identifying the best model and estimating the accumulative metric

Reference

- R. Kohavi, R. M. Henne, and D. Sommerfield, “Practical guide to controlled experiments on the web: Listen to your customers not to the hippo,” in Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’07, (New York, NY, USA), p. 959–967, 2007.
- K. Hofmann, L. Li, and F. Radlinski, “Online evaluation for information retrieval,” Foundations and Trends R in Information Retrieval, vol. 10, pp. 1–117, June 2016.
- M. Sanderson, “Test collection based evaluation of information retrieval systems,” Foundations and Trends R in Information Retrieval, vol. 4, no. 4, pp. 247–375, 2010.
- D. Precup, R. S. Sutton, and S. P. Singh, “Eligibility traces for off-policy policy evaluation,” in ICML, pp. 759–766, 2000.
- M. Dudík, D. Erhan, J. Langford, and L. Li, “Doubly robust policy evaluation and optimization,” Statistical Science, vol. 29, pp. 485–511, 11 2014.
- M. Farajtabar, Y. Chow, and M. Ghavamzadeh, “More robust doubly robust off-policy evaluation,” in Proceedings of the 35th International Conference on Machine Learning, pp. 1447–1456, 2018.
- Y. Liu, O. Gottesman, A. Raghu, M. Komorowski, A. A. Faisal, F. Doshi-Velez, and E. Brunskill, “Representation balancing mdps for off-policy policy evaluation,” in Advances in Neural Information Processing Systems 31, pp. 2644–2653, 2018.
- N. Vlassis, A. Bibaut, M. Dimakopoulou, and T. Jebara, “On the design of estimators for bandit off-policy evaluation,” in Proceedings of the 36th International Conference on Machine Learning, 2019.
- A. Irpan, K. Rao, K. Bousmalis, C. Harris, J. Ibarz, and S. Levine, “Off-policy evaluation via offpolicy classification,” in Advances in Neural Information Processing Systems 32, pp. 5437–5448, 2019.
- J. Snoek, H. Larochelle, and R. P. Adams, “Practical bayesian optimization of machine learning algorithms,” in Advances in neural information processing systems, pp. 2951–2959, 2012.
- Z. Dai, M. Álvarez, and N. Lawrence, “Efficient modeling of latent information in supervised learning using gaussian processes,” in Advances in Neural Information Processing Systems 30, pp. 5131–5139, 2017.
- A. Damianou and N. Lawrence, “Deep gaussian processes,” in Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, pp. 207–215, 2013.
- Z. Dai, A. C. Damianou, J. González, and N. D. Lawrence, “Variational auto-encoded deep gaussian processes.,” in ICLR, 2016.
- M. Titsias, “Variational learning of inducing variables in sparse gaussian processes,” in Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, pp. 567–574, 2009.
- E. Snelson and Z. Ghahramani, “Sparse gaussian processes using pseudo-inputs,” in Advances in Neural Information Processing Systems 18, pp. 1257–1264, 2006.
- J. Hensman, N. Fusi, and N. D. Lawrence, “Gaussian processes for big data,” in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, p. 282–290, 2013.
- P. Hennig and C. J. Schuler, “Entropy search for information-efficient global optimization,” Journal of Machine Learning Research, vol. 13, no. 57, pp. 1809–1837, 2012.
- C. M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics). Berlin, Heidelberg: Springer-Verlag, 2006.
- H. Akaike, “A new look at the statistical model identification,” IEEE Transactions on Automatic Control, vol. 19, no. 6, pp. 716–723, 1974.
- G. Schwarz, “Estimating the dimension of a model,” Annals of Statistics, vol. 6, pp. 461–464, 03 1978.
- A. Klein, Z. Dai, F. Hutter, N. Lawrence, and J. Gonzalez, “Meta-surrogate benchmarking for hyperparameter optimization,” in Advances in Neural Information Processing Systems 32, pp. 6270–6280, 2019.
- J. R. Lloyd, D. Duvenaud, R. Grosse, J. B. Tenenbaum, and Z. Ghahramani, “Automatic construction and natural-language description of nonparametric regression models,” in Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, p. 1242–1250, 2014.
- G. Malkomes, C. Schaff, and R. Garnett, “Bayesian optimization for automated model selection,” in Advances in Neural Information Processing Systems 29, pp. 2900–2908, 2016.
- H. Kim and Y. W. Teh, “Scaling up the automatic statistician: Scalable structure discovery using gaussian processes,” in Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, pp. 575–584, 2018.
- X. Lu, J. Gonzalez, Z. Dai, and N. Lawrence, “Structured variationally auto-encoded optimization,” in Proceedings of the 35th International Conference on Machine Learning, pp. 3267–3275, 2018.
- T. Elsken, J. H. Metzen, and F. Hutter, “Neural architecture search: A survey,” Journal of Machine Learning Research, vol. 20, no. 55, pp. 1–21, 2019.
- H. Chai, J.-F. Ton, M. A. Osborne, and R. Garnett, “Automated model selection with Bayesian quadrature,” in Proceedings of the 36th International Conference on Machine Learning, pp. 931– 940, 2019.
- M. Sato, “Online model selection based on the variational bayes,” Neural Computation, vol. 13, no. 7, pp. 1649–1681, 2001.
- V. Muthukumar, M. Ray, A. Sahai, and P. Bartlett, “Best of many worlds: Robust model selection for online supervised learning,” in Proceedings of Machine Learning Research, pp. 3177–3186, 2019.
- M. Ghavamzadeh and Y. Engel, “Bayesian policy gradient algorithms,” in Advances in neural information processing systems, pp. 457–464, 2007.
- M. Ghavamzadeh, Y. Engel, and M. Valko, “Bayesian policy gradient and actor-critic algorithms,” The Journal of Machine Learning Research, vol. 17, no. 1, pp. 2319–2371, 2016.
- G. Lee, B. Hou, A. Mandalika, J. Lee, and S. S. Srinivasa, “Bayesian policy optimization for model uncertainty,” in International Conference on Learning Representations, 2019.
- B. Letham and E. Bakshy, “Bayesian optimization for policy search via online-offline experimentation,” Journal of Machine Learning Research, vol. 20, no. 145, pp. 1–30, 2019.
- D. Russo, “Simple bayesian algorithms for best arm identification,” in 29th Annual Conference on Learning Theory, pp. 1417–1418, 2016.
- K. Chaloner and I. Verdinelli, “Bayesian experimental design: A review,” Statistical Science, pp. 273–304, 1995.
- J. M. Hernández-Lobato, M. W. Hoffman, and Z. Ghahramani, “Predictive entropy search for efficient global optimization of black-box functions,” in Advances in neural information processing systems, pp. 918–926, 2014.
- A. Foster, M. Jankowiak, E. Bingham, P. Horsfall, Y. W. Teh, T. Rainforth, and N. Goodman, “Variational bayesian optimal experimental design,” in Advances in Neural Information Processing Systems, pp. 14036–14047, 2019.
- J. Vanlier, C. A. Tiemann, P. A. Hilbers, and N. A. van Riel, “A bayesian approach to targeted experiment design,” Bioinformatics, vol. 28, no. 8, pp. 1136–1142, 2012.
- D. Golovin, A. Krause, and D. Ray, “Near-optimal bayesian active learning with noisy observations,” in Advances in Neural Information Processing Systems, pp. 766–774, 2010.
- B. Shababo, B. Paige, A. Pakman, and L. Paninski, “Bayesian inference and online experimental design for mapping neural microcircuits,” in Advances in Neural Information Processing Systems, pp. 1304–1312, 2013.
- D. Dua and C. Graff, “UCI machine learning repository,” 2017.
- T. G. authors, “GPyOpt: A bayesian optimization framework in python.” http://github. com/SheffieldML/GPyOpt, 2016.
- F. M. Harper and J. A. Konstan, “The movielens datasets: History and context,” ACM Trans. Interact. Intell. Syst., vol. 5, no. 4, 2015.
- N. Hug, “Surprise, a Python library for recommender systems.” http://surpriselib.com, 2017.

Tags

Comments