AccUDNN: A GPU Memory Efficient Accelerator for Training Ultra-Deep Neural Networks

2019 IEEE 37th International Conference on Computer Design (ICCD)(2019)

引用 0|浏览0
暂无评分
摘要
With the implementation of mainstream DL frameworks, scarce GPU memory resource is the primary bottleneck that hinders the trainability and training efficiency of ultra-deep neural networks (UDNN). Prior memory optimization works focus on removing the trainability restriction but leave the training efficiency out of consideration. To fill the gap, we present AccUDNN, an accelerator that aims to make full use of finite GPU memory resource to speed up the training process of UDNN in this paper. AccUDNN mainly includes two modules: memory optimizer and hyperparameter tuner. Memory optimizer develops a novel performance-model guided dynamic swap out/in strategy to meet trainability first and further remedy the efficiency degradation in other swapping strategies. Then, a hyperparameter tuner is designed to explore the efficiency-optimal minibatch size and the matched learning rate after applying the dynamic swapping strategy. Evaluations demonstrate that AccUDNN cuts down the GPU memory requirement of ResNet-152 from more than 24GB to 8GB. In turn, given 12GB GPU memory budget, the efficiency-optimal minibatch size can reach 4.2x larger than Caffe and finally improve the scaling efficiency (speedup) of 8 GPUs' cluster by 1.9x.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要