Auto-Parallelizing Deep Learning for Multi-machine , Multi-GPU Environments

semanticscholar(2017)

引用 1|浏览3
暂无评分
摘要
Being able to use a cluster of GPU resources is favorable especially for machine learning researchers, as training neural nets typically requires at least tens of GPUs in order to finish within feasible time. Neural networks can be made to be trainable on multiple devices, which are CPUs or GPUs in multi-machines, to speed up training and improve convergence. This is further facilitated by the introduction of deep learning systems such as TensorFlow [2], MXNet [4] and Caffe2 [1]; such frameworks allow users to easily utilize multi-machine, multi-GPU environments to train networks, to a certain degree. Unfortunately, extending single-machine, single-GPU neural net models to work in distributed environments is not a trivial job for common machine learning researchers. In most deep learning frameworks [1, 2, 4], a deep learning job is represented as a computation graph, and the computation graph needs to be changed for distributed environments, which requires partitioning model parameters across machines to balance out communication overheads, as well as replicating and assigning graph operators to devices so that hardware resources can cooperate to train a model. To this end, we introduce Parallax, an auto-parallelization module that helps machine learning researchers extend their single-model code to operate in data parallelism with multiGPU and multi-machine. Parallax receives a single-device graph, analyzes the graph, then transforms it into a multimachine, multi-GPU version of the computed settings. The automatically transformed graph can finally be run in distributed environments. Preliminary experiments show that with the help of Parallax, the ResNet-50 [9] model can be trained on a total of 12 GPUs across 3 machines with sublinear scale-out improvements in computation throughput. We also discuss several extensions on Parallax, including the application of model parallelism strategies to boost performance for models with relatively large parameters, as well as hybrid parallelism strategies that utilize both data parallelism and model parallelism.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要