Adaptive partitioning and efficient scheduling for distributed DNN training in heterogeneous IoT environment


引用 0|浏览4
With the increasing proliferation of Internet -of -Things (IoT) devices, it is a growing trend toward training a deep neural network (DNN) model in pipeline parallelism across resource -constraint IoT devices. To ensure the model convergence and accuracy, synchronous pipeline parallelism is usually adopted. However, the synchronous pipeline can incur a long waiting time due to its gradient aggregation of all microbatches. It is urgent for a DNN model to design an adaptive partitioning and efficient scheduling scheme in heterogeneous IoT environment. To address this problem, we propose a policy gradient based model partitioning and scheduling scheme (PG-MPSS) to minimize per -iteration training time. More specifically, we first design a double -network framework to divide and schedule a DNN model. Then, we adopt a policy gradient algorithm to update the double -network parameters, aiming at learning an optimal double -network model. We conduct extensive experiments to compare the DNN training time of the PG-MPSS scheme with that of Dynamic Programming (DP), Genetic Algorithm (GA), Particle Swarm Optimization (PSO), Average&Greedy (AG) and Proximal Policy Optimization (PPO) five baseline algorithms under different experimental settings. The related experimental results demonstrate that the PG-MPSS scheme can greatly expedite synchronous pipeline training of a DNN model.
Synchronous pipeline parallelism,Policy gradient,Heterogeneous IoT environment
AI 理解论文
Chat Paper