AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
This paper has introduced two sequential hyper-parameter optimization algorithms, and shown them to meet or exceed human performance and the performance of a brute-force random search in two difficult hyper-parameter optimization tasks involving Deep Belief Networks

Algorithms for Hyper-Parameter Optimization.

NIPS, pp.2546-2554, (2011)

Cited by: 2548|Views164
EI
Full Text
Bibtex
Weibo

Abstract

Several recent advances to the state of the art in image classification benchmarks have come from better configurations of existing techniques rather than novel approaches to feature learning. Traditionally, hyper-parameter optimization has been the job of humans because they can be very efficient in regimes where only a few trials are po...More

Code:

Data:

Introduction
  • Models such as Deep Belief Networks (DBNs) [2], stacked denoising autoencoders [3], convolutional networks [4], as well as classifiers based on sophisticated feature extraction techniques have from ten to perhaps fifty hyper-parameters, depending on how the experimenter chooses to parametrize the model, and how many hyper-parameters the experimenter chooses to fix at a reasonable default.
  • The difficulty of tuning these models makes published results difficult to reproduce and extend, and makes even the original investigation of such methods more of an art than a science
  • Recent results such as [5], [6], and [7] demonstrate that the challenge of hyper-parameter optimization in large and multilayer models is a direct impediment to scientific progress.
  • The results of [5] and [7] suggest that with current generation hardware such as large computer clusters and GPUs, the optimal allocation of CPU cycles includes more hyper-parameter exploration than has been typical in the machine learning literature
Highlights
  • Models such as Deep Belief Networks (DBNs) [2], stacked denoising autoencoders [3], convolutional networks [4], as well as classifiers based on sophisticated feature extraction techniques have from ten to perhaps fifty hyper-parameters, depending on how the experimenter chooses to parametrize the model, and how many hyper-parameters the experimenter chooses to fix at a reasonable default
  • This paper has introduced two sequential hyper-parameter optimization algorithms, and shown them to meet or exceed human performance and the performance of a brute-force random search in two difficult hyper-parameter optimization tasks involving Deep Belief Networks
  • 0.5 time variables are sometimes irrelevant, depending on the value of other parameters. In this 32-dimensional search problem, the Tree-structured Parzen Estimator Approach algorithm presented here has uncovered new best results on both of these datasets that are significantly better than what Deep Belief Networks were previously believed to achieve
  • The Gaussian Process Approach and Tree-structured Parzen Estimator Approach algorithms are practical: the optimization for each dataset was done in just 24 hours using five GPU processors
Results
  • TPE’s best was significantly better than both manual search (19%) and random search with 200 trials (17%).
Conclusion
  • The trajectories (H) constructed by each algorithm up to 200 steps are illustrated in Figure 4, and compared with random search and the manual search carried out in [1].
  • On the MRBI dataset (10-way classification), random search was the worst performer (50% error), the GP approach and manual search approximately tied (47% error), while the TPE algorithm found a new best result (44% error).
  • 0.5 time time variables are sometimes irrelevant, depending on the value of other parameters
  • In this 32-dimensional search problem, the TPE algorithm presented here has uncovered new best results on both of these datasets that are significantly better than what DBNs were previously believed to achieve.
  • The authors' results are only for DBNs, the methods are quite general, and extend naturally to any hyper-parameter optimization problem in which the hyper-parameters are drawn from a measurable set
Tables
  • Table1: Distribution over DBN hyper-parameters for random sampling. Options separated by “or”
  • Table2: The test set classification error of the best model found by each search algorithm on each problem. Each search algorithm was allowed up to 200 trials. The manual searches used 82 trials for convex and 27 trials MRBI
Download tables as Excel
Funding
  • This work was supported by the National Science and Engineering Research Council of Canada, Compute Canada, and by the ANR-2010-COSI-002 grant of the French National Research Agency
Reference
  • H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio. An empirical evaluation of deep architectures on problems with many factors of variation. In ICML 2007, pages 473–480, 2007.
    Google ScholarLocate open access versionFindings
  • G. E. Hinton, S. Osindero, and Y. Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18:1527–1554, 2006.
    Google ScholarLocate open access versionFindings
  • P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P. A. Manzagol. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Machine Learning Research, 11:3371–3408, 2010.
    Google ScholarLocate open access versionFindings
  • Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, November 1998.
    Google ScholarLocate open access versionFindings
  • Nicolas Pinto, David Doukhan, James J. DiCarlo, and David D. Cox. A high-throughput screening approach to discovering good forms of biologically inspired visual representation. PLoS Comput Biol, 5(11):e1000579, 11 2009.
    Google ScholarLocate open access versionFindings
  • A. Coates, H. Lee, and A. Ng. An analysis of single-layer networks in unsupervised feature learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop, 2010.
    Google ScholarLocate open access versionFindings
  • A. Coates and A. Y. Ng. The importance of encoding versus training with sparse coding and vector quantization. In Proceedings of the Twenty-eighth International Conference on Machine Learning (ICML11), 2010.
    Google ScholarLocate open access versionFindings
  • F. Hutter. Automated Configuration of Algorithms for Solving Hard Computational Problems. PhD thesis, University of British Columbia, 2009.
    Google ScholarFindings
  • F. Hutter, H. Hoos, and K. Leyton-Brown. Sequential model-based optimization for general algorithm configuration. In LION-5, 2011. Extended version as UBC Tech report TR-2010-10.
    Google ScholarFindings
  • D.R. Jones. A taxonomy of global optimization methods based on response surfaces. Journal of Global Optimization, 21:345–383, 2001.
    Google ScholarLocate open access versionFindings
  • J. Villemonteix, E. Vazquez, and E. Walter. An informational approach to the global optimization of expensive-to-evaluate functions. Journal of Global Optimization, 2006.
    Google ScholarLocate open access versionFindings
  • N. Srinivas, A. Krause, S. Kakade, and M. Seeger. Gaussian process optimization in the bandit setting: No regret and experimental design. In ICML, 2010.
    Google ScholarLocate open access versionFindings
  • J. Mockus, V. Tiesis, and A. Zilinskas. The application of Bayesian methods for seeking the extremum. In L.C.W. Dixon and G.P. Szego, editors, Towards Global Optimization, volume 2, pages 117–129. North Holland, New York, 1978.
    Google ScholarLocate open access versionFindings
  • C.E. Rasmussen and C. Williams. Gaussian Processes for Machine Learning.
    Google ScholarFindings
  • D. Ginsbourger, D. Dupuy, A. Badea, L. Carraro, and O. Roustant. A note on the choice and the estimation of kriging models for the analysis of deterministic computer experiments. 25:115–131, 2009.
    Google ScholarFindings
  • R. Bardenet and B. Kegl. Surrogating the surrogate: accelerating Gaussian Process optimization with mixtures. In ICML, 2010.
    Google ScholarLocate open access versionFindings
  • P. Larranaga and J. Lozano, editors. Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation. Springer, 2001.
    Google ScholarFindings
  • [19] J. Bergstra and Y. Bengio. Random search for hyper-parameter optimization. The Learning Workshop (Snowbird), 2011.
    Google ScholarLocate open access versionFindings
  • [20] A. Hyvarinen and E. Oja. Independent component analysis: Algorithms and applications. Neural Networks, 13(4–5):411–430, 2000.
    Google ScholarLocate open access versionFindings
  • [21] J. Bergstra and Y. Bengio. Random search for hyper-parameter optimization. JMLR, 2012. Accepted.
    Google ScholarLocate open access versionFindings
  • [22] C. Bishop. Neural networks for pattern recognition. 1995.
    Google ScholarFindings
  • [23] J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, and Y. Bengio. Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy), June 2010.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科