DistMind: Efficient Resource Disaggregation for Deep Learning Workloads

Xin Jin,Zhihao Bai,Zhen Zhang,Yibo Zhu, Yinmin Zhong,Xuanzhe Liu

IEEE-ACM TRANSACTIONS ON NETWORKING（2024）

引用 0|浏览5

暂无评分

摘要

Deep learning (DL) systems suffer from low resource utilization due to 1) monolithic server model that tightly couples compute and memory; and 2) limited sharing between different inference applications, and across inference and training, because of strict service level objectives (SLOs). To address this problem, we present DistMind, a disaggregated DL system that enables efficient multiplexing of DL applications with near-optimal resource utilization. DistMind decouples compute from host memory, and exposes the abstractions of a GPU pool and a memory pool, each of which can be independently provisioned. The key challenge is to dynamically allocate GPU resources to different applications based on their real-time demands while meeting strict SLOs. We tackle this challenge by exploiting the power of high-speed 100 Gbps networks, and design three-stage pipelining, cache-aware load balancing, and DNN-aware sharding mechanisms based on the characteristics of DL workloads, to achieve millisecond-scale application loading overhead and improve system efficiency. We have implemented a prototype of DistMind and integrated it with PyTorch. Experimental results on AWS EC2 show that DistMind achieves near 100% resource utilization, and compared with NVIDIA MPS and Ray, DistMind improves the throughput by up to 279% and reduces the inference latency by up to 94%.

查看译文

关键词

Machine learning systems,resource disaggregation,resource management,scheduling,deep learning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要