Model Selection for Production System via Automated Online Experiments

NIPS 2020, 2020.

Cited by: 0|Bibtex|Views37|Links
EI
Keywords:
randomized experimentautomated online experimentationupper confidence boundml modelmodel uncertaintyMore(24+)
Weibo:
The model selection for production system does not fit into the classical model selection paradigm

Abstract:

A challenge that machine learning practitioners in the industry face is the task of selecting the best model to deploy in production. As a model is often an intermediate component of a production system, online controlled experiments such as A/B tests yield the most reliable estimation of the effectiveness of the whole system, but can onl...More

Code:

Data:

0
Introduction
  • Evaluating the effect of individual changes to machine learning (ML) systems such as choice of algorithms, features, etc., is the key to growth in many internet services and industrial applications.
  • Classical model selection paradigms such as cross-validation consider ML models in isolation and focus on selecting the model with the best predictive power on unseen data.
  • This approach does not work well for modern industrial ML systems, as such a system usually consists of many individual components and a ML model is only one of them.
Highlights
  • Evaluating the effect of individual changes to machine learning (ML) systems such as choice of algorithms, features, etc., is the key to growth in many internet services and industrial applications
  • We propose a new framework of model selection for production system, where the best model is selected via deploying a sequence of models online
  • The model selection for production system does not fit into the classical model selection paradigm
  • We propose a new approach by taking data collection into the model selection process and selecting the best model via iterative online experiments
  • The model to deploy at each iteration is picked by balancing the predicted accumulative metric and the uncertainty of the prediction due to limited data
  • With simulated experiments from real data, we show that automated online experimentation (AOE) performs significantly better than all the baselines in terms of identifying the best model and estimating the accumulative metric
Results
  • With simulated experiments from real data, the authors show that AOE performs significantly better than all the baselines in terms of identifying the best model and estimating the accumulative metric.
Conclusion
  • The model selection for production system does not fit into the classical model selection paradigm.
  • The authors propose a new approach by taking data collection into the model selection process and selecting the best model via iterative online experiments.
  • It allows selection from a much larger pool of candidates than using A/B testing and gives more accurate selection than off-policy evaluation by actively reducing selection bias.
  • With simulated experiments from real data, the authors show that AOE performs significantly better than all the baselines in terms of identifying the best model and estimating the accumulative metric
Summary
  • Introduction:

    Evaluating the effect of individual changes to machine learning (ML) systems such as choice of algorithms, features, etc., is the key to growth in many internet services and industrial applications.
  • Classical model selection paradigms such as cross-validation consider ML models in isolation and focus on selecting the model with the best predictive power on unseen data.
  • This approach does not work well for modern industrial ML systems, as such a system usually consists of many individual components and a ML model is only one of them.
  • Results:

    With simulated experiments from real data, the authors show that AOE performs significantly better than all the baselines in terms of identifying the best model and estimating the accumulative metric.
  • Conclusion:

    The model selection for production system does not fit into the classical model selection paradigm.
  • The authors propose a new approach by taking data collection into the model selection process and selecting the best model via iterative online experiments.
  • It allows selection from a much larger pool of candidates than using A/B testing and gives more accurate selection than off-policy evaluation by actively reducing selection bias.
  • With simulated experiments from real data, the authors show that AOE performs significantly better than all the baselines in terms of identifying the best model and estimating the accumulative metric
Related work
  • Model selection [18] is a classical topic in ML. The standard paradigm of model selection considers a model in insolation and aims at selecting a model that has the best predictive power for unseen data based on an offline dataset. Common techniques such as cross-validation, bootstrapping, Akaike information criterion [AIC, 19] and Bayesian information criterion [BIC, 20] have been widely used

    Algorithm 1: model selection with automated online experiments (AOE) Result: Return the ML system with the highest accumulative metric Collect the initial data D0; while Online experiment budget is not over do

    Infer p(f |A, X, Dt−1) with VI on surrogate model ; Identify Mt = arg maxMi∈M α(Mi); Deploy Mt and construct Dt by augmenting the collected data into Dt−1 ; end for scoring a model’s predictive power based on a given dataset. As scoring all the candidate models does not scale for complex problems, many recent works focus on tackling the problem of searching a large continuous and/or combinatorial space of model configurations, ranging from hyper-parameter optimization [HPO, 10, 21], automatic statistician [22, 23, 24, 25] to neural network architecture search [NAS, 26]. A more recent work [27] jointly considers the scoring and searching problem for computational efficiency. Online model selection [28, 29] is an extension of the standard model selection paradigm. It still treats a model in isolation but considers the online learning scenario, in which data arrive sequentially and the models are continuously updated. This is different to MSPS, which views a model in the context of a bigger system and actively controls the data collection.
Funding
  • With simulated experiments from real data, we show that AOE performs significantly better than all the baselines in terms of identifying the best model and estimating the accumulative metric
Reference
  • R. Kohavi, R. M. Henne, and D. Sommerfield, “Practical guide to controlled experiments on the web: Listen to your customers not to the hippo,” in Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’07, (New York, NY, USA), p. 959–967, 2007.
    Google ScholarLocate open access versionFindings
  • K. Hofmann, L. Li, and F. Radlinski, “Online evaluation for information retrieval,” Foundations and Trends R in Information Retrieval, vol. 10, pp. 1–117, June 2016.
    Google ScholarLocate open access versionFindings
  • M. Sanderson, “Test collection based evaluation of information retrieval systems,” Foundations and Trends R in Information Retrieval, vol. 4, no. 4, pp. 247–375, 2010.
    Google ScholarLocate open access versionFindings
  • D. Precup, R. S. Sutton, and S. P. Singh, “Eligibility traces for off-policy policy evaluation,” in ICML, pp. 759–766, 2000.
    Google ScholarFindings
  • M. Dudík, D. Erhan, J. Langford, and L. Li, “Doubly robust policy evaluation and optimization,” Statistical Science, vol. 29, pp. 485–511, 11 2014.
    Google ScholarLocate open access versionFindings
  • M. Farajtabar, Y. Chow, and M. Ghavamzadeh, “More robust doubly robust off-policy evaluation,” in Proceedings of the 35th International Conference on Machine Learning, pp. 1447–1456, 2018.
    Google ScholarLocate open access versionFindings
  • Y. Liu, O. Gottesman, A. Raghu, M. Komorowski, A. A. Faisal, F. Doshi-Velez, and E. Brunskill, “Representation balancing mdps for off-policy policy evaluation,” in Advances in Neural Information Processing Systems 31, pp. 2644–2653, 2018.
    Google ScholarLocate open access versionFindings
  • N. Vlassis, A. Bibaut, M. Dimakopoulou, and T. Jebara, “On the design of estimators for bandit off-policy evaluation,” in Proceedings of the 36th International Conference on Machine Learning, 2019.
    Google ScholarLocate open access versionFindings
  • A. Irpan, K. Rao, K. Bousmalis, C. Harris, J. Ibarz, and S. Levine, “Off-policy evaluation via offpolicy classification,” in Advances in Neural Information Processing Systems 32, pp. 5437–5448, 2019.
    Google ScholarLocate open access versionFindings
  • J. Snoek, H. Larochelle, and R. P. Adams, “Practical bayesian optimization of machine learning algorithms,” in Advances in neural information processing systems, pp. 2951–2959, 2012.
    Google ScholarLocate open access versionFindings
  • Z. Dai, M. Álvarez, and N. Lawrence, “Efficient modeling of latent information in supervised learning using gaussian processes,” in Advances in Neural Information Processing Systems 30, pp. 5131–5139, 2017.
    Google ScholarLocate open access versionFindings
  • A. Damianou and N. Lawrence, “Deep gaussian processes,” in Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, pp. 207–215, 2013.
    Google ScholarLocate open access versionFindings
  • Z. Dai, A. C. Damianou, J. González, and N. D. Lawrence, “Variational auto-encoded deep gaussian processes.,” in ICLR, 2016.
    Google ScholarFindings
  • M. Titsias, “Variational learning of inducing variables in sparse gaussian processes,” in Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, pp. 567–574, 2009.
    Google ScholarLocate open access versionFindings
  • E. Snelson and Z. Ghahramani, “Sparse gaussian processes using pseudo-inputs,” in Advances in Neural Information Processing Systems 18, pp. 1257–1264, 2006.
    Google ScholarLocate open access versionFindings
  • J. Hensman, N. Fusi, and N. D. Lawrence, “Gaussian processes for big data,” in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, p. 282–290, 2013.
    Google ScholarLocate open access versionFindings
  • P. Hennig and C. J. Schuler, “Entropy search for information-efficient global optimization,” Journal of Machine Learning Research, vol. 13, no. 57, pp. 1809–1837, 2012.
    Google ScholarLocate open access versionFindings
  • C. M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics). Berlin, Heidelberg: Springer-Verlag, 2006.
    Google ScholarFindings
  • H. Akaike, “A new look at the statistical model identification,” IEEE Transactions on Automatic Control, vol. 19, no. 6, pp. 716–723, 1974.
    Google ScholarLocate open access versionFindings
  • G. Schwarz, “Estimating the dimension of a model,” Annals of Statistics, vol. 6, pp. 461–464, 03 1978.
    Google ScholarLocate open access versionFindings
  • A. Klein, Z. Dai, F. Hutter, N. Lawrence, and J. Gonzalez, “Meta-surrogate benchmarking for hyperparameter optimization,” in Advances in Neural Information Processing Systems 32, pp. 6270–6280, 2019.
    Google ScholarLocate open access versionFindings
  • J. R. Lloyd, D. Duvenaud, R. Grosse, J. B. Tenenbaum, and Z. Ghahramani, “Automatic construction and natural-language description of nonparametric regression models,” in Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, p. 1242–1250, 2014.
    Google ScholarLocate open access versionFindings
  • G. Malkomes, C. Schaff, and R. Garnett, “Bayesian optimization for automated model selection,” in Advances in Neural Information Processing Systems 29, pp. 2900–2908, 2016.
    Google ScholarLocate open access versionFindings
  • H. Kim and Y. W. Teh, “Scaling up the automatic statistician: Scalable structure discovery using gaussian processes,” in Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, pp. 575–584, 2018.
    Google ScholarLocate open access versionFindings
  • X. Lu, J. Gonzalez, Z. Dai, and N. Lawrence, “Structured variationally auto-encoded optimization,” in Proceedings of the 35th International Conference on Machine Learning, pp. 3267–3275, 2018.
    Google ScholarLocate open access versionFindings
  • T. Elsken, J. H. Metzen, and F. Hutter, “Neural architecture search: A survey,” Journal of Machine Learning Research, vol. 20, no. 55, pp. 1–21, 2019.
    Google ScholarLocate open access versionFindings
  • H. Chai, J.-F. Ton, M. A. Osborne, and R. Garnett, “Automated model selection with Bayesian quadrature,” in Proceedings of the 36th International Conference on Machine Learning, pp. 931– 940, 2019.
    Google ScholarLocate open access versionFindings
  • M. Sato, “Online model selection based on the variational bayes,” Neural Computation, vol. 13, no. 7, pp. 1649–1681, 2001.
    Google ScholarLocate open access versionFindings
  • V. Muthukumar, M. Ray, A. Sahai, and P. Bartlett, “Best of many worlds: Robust model selection for online supervised learning,” in Proceedings of Machine Learning Research, pp. 3177–3186, 2019.
    Google ScholarLocate open access versionFindings
  • M. Ghavamzadeh and Y. Engel, “Bayesian policy gradient algorithms,” in Advances in neural information processing systems, pp. 457–464, 2007.
    Google ScholarFindings
  • M. Ghavamzadeh, Y. Engel, and M. Valko, “Bayesian policy gradient and actor-critic algorithms,” The Journal of Machine Learning Research, vol. 17, no. 1, pp. 2319–2371, 2016.
    Google ScholarLocate open access versionFindings
  • G. Lee, B. Hou, A. Mandalika, J. Lee, and S. S. Srinivasa, “Bayesian policy optimization for model uncertainty,” in International Conference on Learning Representations, 2019.
    Google ScholarLocate open access versionFindings
  • B. Letham and E. Bakshy, “Bayesian optimization for policy search via online-offline experimentation,” Journal of Machine Learning Research, vol. 20, no. 145, pp. 1–30, 2019.
    Google ScholarLocate open access versionFindings
  • D. Russo, “Simple bayesian algorithms for best arm identification,” in 29th Annual Conference on Learning Theory, pp. 1417–1418, 2016.
    Google ScholarLocate open access versionFindings
  • K. Chaloner and I. Verdinelli, “Bayesian experimental design: A review,” Statistical Science, pp. 273–304, 1995.
    Google ScholarLocate open access versionFindings
  • J. M. Hernández-Lobato, M. W. Hoffman, and Z. Ghahramani, “Predictive entropy search for efficient global optimization of black-box functions,” in Advances in neural information processing systems, pp. 918–926, 2014.
    Google ScholarLocate open access versionFindings
  • A. Foster, M. Jankowiak, E. Bingham, P. Horsfall, Y. W. Teh, T. Rainforth, and N. Goodman, “Variational bayesian optimal experimental design,” in Advances in Neural Information Processing Systems, pp. 14036–14047, 2019.
    Google ScholarLocate open access versionFindings
  • J. Vanlier, C. A. Tiemann, P. A. Hilbers, and N. A. van Riel, “A bayesian approach to targeted experiment design,” Bioinformatics, vol. 28, no. 8, pp. 1136–1142, 2012.
    Google ScholarLocate open access versionFindings
  • D. Golovin, A. Krause, and D. Ray, “Near-optimal bayesian active learning with noisy observations,” in Advances in Neural Information Processing Systems, pp. 766–774, 2010.
    Google ScholarLocate open access versionFindings
  • B. Shababo, B. Paige, A. Pakman, and L. Paninski, “Bayesian inference and online experimental design for mapping neural microcircuits,” in Advances in Neural Information Processing Systems, pp. 1304–1312, 2013.
    Google ScholarLocate open access versionFindings
  • D. Dua and C. Graff, “UCI machine learning repository,” 2017.
    Google ScholarFindings
  • T. G. authors, “GPyOpt: A bayesian optimization framework in python.” http://github. com/SheffieldML/GPyOpt, 2016.
    Findings
  • F. M. Harper and J. A. Konstan, “The movielens datasets: History and context,” ACM Trans. Interact. Intell. Syst., vol. 5, no. 4, 2015.
    Google ScholarLocate open access versionFindings
  • N. Hug, “Surprise, a Python library for recommender systems.” http://surpriselib.com, 2017.
    Findings
Your rating :
0

 

Tags
Comments