Effectiveness of Moldable and Malleable Scheduling in Deep Learning Tasks

2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS)(2018)

引用 5|浏览47
暂无评分
摘要
Research and development of deep learning (DL) applications often involves exhaustive trial-and-error, which demands that shared computational resources, especially GPUs, be efficiently allocated. Most DL tasks are moldable or malleable (i.e., the number of allocated GPUs can be changed before or during execution). However, conventional batch schedulers do not take advantage of DL tasks' moldability/malleability, inhibiting speedup when some GPU resources are unallocated. Another opportunity for speedup is to run multiple tasks concurrently on one GPU, which may improve the overall throughput because a single task does not always fully utilize the GPU's computational resources. We propose designing a batch scheduling system that exploits these opportunities to accelerate DL tasks. As a first step, this study conducts an extensive case study to evaluate the speedup of DL tasks when a scheduler treats them as moldable or malleable. That is, the scheduler adjusts the number of GPUs to be (or already) allocated to a task in response to the fluctuating availability of GPUs. Simulations using our real workload trace show that if the scheduler can allocate 1–4 GPUs to a task or assign 1–4 tasks to a GPU, then the average flow time of moldable/malleable DL tasks is shortened by at least 15.1 %/42.5 %, respectively, compared to a Rigid FCFS schedule in which one GPU is allocated to each task.
更多
查看译文
关键词
Task analysis,Graphics processing units,Artificial neural networks,Parallel processing,Training,Runtime,Scheduling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要