CAPSlog: Scalable Memory-Centric Partitioning for Pipeline Parallelism.

Henk Dreuning, Anna Badia Liokouras, Xiaowei Ouyang, Henri E. Bal, Rob V. van Nieuwpoort

International Euromicro Conference on Parallel, Distributed and Network-Based Processing(2024)

引用 0|浏览0
暂无评分
摘要
Pipeline-parallel training has emerged as a popular method to train large Deep Neural Networks (DNNs), as it allows the use of the combined compute power and memory capacity of multiple Graphics Processing Units (GPUs). However, with the sustaining increase in Deep Learning (DL) model sizes, pipeline parallelism provides only a partial solution to the memory bottleneck in large-scale DNN training. Careful partitioning of the DL model over the available GPUs based on memory usage is required to further alleviate the memory bottleneck and train larger DNNs. mCAP is such a memory-oriented partitioning approach for pipeline parallel systems, but it does not scale to models with many layers and very large hardware setups, as it requires extensive profiling and fails to efficiently navigate the partitioning space to find the most memory-friendly partitioning. In this work, we propose CAPSlog, a scalable memory-centric partitioning approach that can recommend model partitionings for larger and more heterogeneous DL models and for larger hardware setups than existing approaches. CAPSlog introduces a new profiling method and a new, much more scalable algorithm for recommending memory-efficient partitionings. CAPSlog re-duces the profiling time by 67 % compared to existing approaches, searches the partitioning space for the optimal solution orders of magnitude faster and can train significantly larger models.
更多
查看译文
关键词
Deep Learning,Pipeline Parallelism,Memory
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要