Partial data permutation for training deep neural networks

2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID)(2020)

引用 0|浏览35
暂无评分
摘要
Random data permutation is considered as a best practice for training deep neural networks. When the input is large, permuting the full dataset is costly and limits scaling on distributed systems. Some practitioners use partial or no permutation that may potentially result in poor convergence.We propose a partitioned data permutation scheme as a low-cost alternative to full data permutation. Analyzing their entropy, we show that the two sampling schemes are asymptotically identical. We also show with minibatch SGD, both sampling schemes produce unbiased estimators of the true gradient. In addition, they have the same bound on the second moment of the gradient. Thus they have similar convergence properties. Our experiments confirm that SGD has similar training performance in practice with both sampling schemes.We further show that due to inherent randomness such as data augmentation and dropout in the training, even faster sampling schemes than partial permutation such as sequential sampling can achieve good performance. However, if no extra randomness is present in training, sampling schemes with low entropy can indeed degrade performance significantly.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要