Annealed dropout training of deep networks

Spoken Language Technology Workshop(2014)

引用 68|浏览26
暂无评分
摘要
Recently it has been shown that when training neural networks on a limited amount of data, randomly zeroing, or “dropping out” a fixed percentage of the outputs of a given layer for each training case can improve test set performance significantly. Dropout training discourages the detectors in the network from co-adapting, which limits the capacity of the network and prevents overfitting. In this paper we show that annealing the dropout rate from a high initial value to zero over the course of training can substantially improve the quality of the resulting model. As dropout (approximately) implements model aggregation over an exponential number of networks, this procedure effectively initializes the ensemble of models that will be learned during a given iteration of training with an enemble of models that has a lower average number of neurons per network, and higher variance in the number of neurons per network-which regularizes the structure of the final model toward models that avoid unnecessary co-adaptation between neurons. Importantly, this regularization procedure is stochastic, and so promotes the learning of “balanced” networks with neurons that have high average entropy, and low variance in their entropy, by smoothly transitioning from “exploration” with high learning rates to “fine tuning” with full support for co-adaptation between neurons where necessary. Experimental results demonstrate that annealed dropout leads to significant reductions in word error rate over standard dropout training.
更多
查看译文
关键词
neural nets,annealed dropout training,deep neural networks,regularization procedure,Annealed Dropout,Deep Neural Networks,Deterministic Annealing,Dropout Training,Maxout Networks,Model aggregation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要