## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# Algorithms for Hyper-Parameter Optimization.

NIPS, pp.2546-2554, (2011)

EI

Keywords

Abstract

Several recent advances to the state of the art in image classification benchmarks have come from better configurations of existing techniques rather than novel approaches to feature learning. Traditionally, hyper-parameter optimization has been the job of humans because they can be very efficient in regimes where only a few trials are po...More

Code:

Data:

Introduction

- Models such as Deep Belief Networks (DBNs) [2], stacked denoising autoencoders [3], convolutional networks [4], as well as classifiers based on sophisticated feature extraction techniques have from ten to perhaps fifty hyper-parameters, depending on how the experimenter chooses to parametrize the model, and how many hyper-parameters the experimenter chooses to fix at a reasonable default.
- The difficulty of tuning these models makes published results difficult to reproduce and extend, and makes even the original investigation of such methods more of an art than a science
- Recent results such as [5], [6], and [7] demonstrate that the challenge of hyper-parameter optimization in large and multilayer models is a direct impediment to scientific progress.
- The results of [5] and [7] suggest that with current generation hardware such as large computer clusters and GPUs, the optimal allocation of CPU cycles includes more hyper-parameter exploration than has been typical in the machine learning literature

Highlights

- Models such as Deep Belief Networks (DBNs) [2], stacked denoising autoencoders [3], convolutional networks [4], as well as classifiers based on sophisticated feature extraction techniques have from ten to perhaps fifty hyper-parameters, depending on how the experimenter chooses to parametrize the model, and how many hyper-parameters the experimenter chooses to fix at a reasonable default
- This paper has introduced two sequential hyper-parameter optimization algorithms, and shown them to meet or exceed human performance and the performance of a brute-force random search in two difficult hyper-parameter optimization tasks involving Deep Belief Networks
- 0.5 time variables are sometimes irrelevant, depending on the value of other parameters. In this 32-dimensional search problem, the Tree-structured Parzen Estimator Approach algorithm presented here has uncovered new best results on both of these datasets that are significantly better than what Deep Belief Networks were previously believed to achieve
- The Gaussian Process Approach and Tree-structured Parzen Estimator Approach algorithms are practical: the optimization for each dataset was done in just 24 hours using five GPU processors

Results

- TPE’s best was significantly better than both manual search (19%) and random search with 200 trials (17%).

Conclusion

- The trajectories (H) constructed by each algorithm up to 200 steps are illustrated in Figure 4, and compared with random search and the manual search carried out in [1].
- On the MRBI dataset (10-way classification), random search was the worst performer (50% error), the GP approach and manual search approximately tied (47% error), while the TPE algorithm found a new best result (44% error).
- 0.5 time time variables are sometimes irrelevant, depending on the value of other parameters
- In this 32-dimensional search problem, the TPE algorithm presented here has uncovered new best results on both of these datasets that are significantly better than what DBNs were previously believed to achieve.
- The authors' results are only for DBNs, the methods are quite general, and extend naturally to any hyper-parameter optimization problem in which the hyper-parameters are drawn from a measurable set

- Table1: Distribution over DBN hyper-parameters for random sampling. Options separated by “or”
- Table2: The test set classification error of the best model found by each search algorithm on each problem. Each search algorithm was allowed up to 200 trials. The manual searches used 82 trials for convex and 27 trials MRBI

Funding

- This work was supported by the National Science and Engineering Research Council of Canada, Compute Canada, and by the ANR-2010-COSI-002 grant of the French National Research Agency

Reference

- H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio. An empirical evaluation of deep architectures on problems with many factors of variation. In ICML 2007, pages 473–480, 2007.
- G. E. Hinton, S. Osindero, and Y. Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18:1527–1554, 2006.
- P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P. A. Manzagol. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Machine Learning Research, 11:3371–3408, 2010.
- Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, November 1998.
- Nicolas Pinto, David Doukhan, James J. DiCarlo, and David D. Cox. A high-throughput screening approach to discovering good forms of biologically inspired visual representation. PLoS Comput Biol, 5(11):e1000579, 11 2009.
- A. Coates, H. Lee, and A. Ng. An analysis of single-layer networks in unsupervised feature learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop, 2010.
- A. Coates and A. Y. Ng. The importance of encoding versus training with sparse coding and vector quantization. In Proceedings of the Twenty-eighth International Conference on Machine Learning (ICML11), 2010.
- F. Hutter. Automated Configuration of Algorithms for Solving Hard Computational Problems. PhD thesis, University of British Columbia, 2009.
- F. Hutter, H. Hoos, and K. Leyton-Brown. Sequential model-based optimization for general algorithm configuration. In LION-5, 2011. Extended version as UBC Tech report TR-2010-10.
- D.R. Jones. A taxonomy of global optimization methods based on response surfaces. Journal of Global Optimization, 21:345–383, 2001.
- J. Villemonteix, E. Vazquez, and E. Walter. An informational approach to the global optimization of expensive-to-evaluate functions. Journal of Global Optimization, 2006.
- N. Srinivas, A. Krause, S. Kakade, and M. Seeger. Gaussian process optimization in the bandit setting: No regret and experimental design. In ICML, 2010.
- J. Mockus, V. Tiesis, and A. Zilinskas. The application of Bayesian methods for seeking the extremum. In L.C.W. Dixon and G.P. Szego, editors, Towards Global Optimization, volume 2, pages 117–129. North Holland, New York, 1978.
- C.E. Rasmussen and C. Williams. Gaussian Processes for Machine Learning.
- D. Ginsbourger, D. Dupuy, A. Badea, L. Carraro, and O. Roustant. A note on the choice and the estimation of kriging models for the analysis of deterministic computer experiments. 25:115–131, 2009.
- R. Bardenet and B. Kegl. Surrogating the surrogate: accelerating Gaussian Process optimization with mixtures. In ICML, 2010.
- P. Larranaga and J. Lozano, editors. Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation. Springer, 2001.
- [19] J. Bergstra and Y. Bengio. Random search for hyper-parameter optimization. The Learning Workshop (Snowbird), 2011.
- [20] A. Hyvarinen and E. Oja. Independent component analysis: Algorithms and applications. Neural Networks, 13(4–5):411–430, 2000.
- [21] J. Bergstra and Y. Bengio. Random search for hyper-parameter optimization. JMLR, 2012. Accepted.
- [22] C. Bishop. Neural networks for pattern recognition. 1995.
- [23] J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, and Y. Bengio. Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy), June 2010.

Tags

Comments

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn