Kube-Knots: Resource Harvesting through Dynamic Container Orchestration in GPU-based Datacenters

2019 IEEE International Conference on Cluster Computing (CLUSTER)(2019)

引用 26|浏览46
暂无评分
摘要
Compute heterogeneity is increasingly gaining prominence in modern datacenters due to the addition of accelerators like GPUs and FPGAs. We observe that datacenter schedulers are agnostic of these emerging accelerators, especially their resource utilization footprints, and thus, not well equipped to dynamically provision them based on the application needs. We observe that the state-of-the-art datacenter schedulers fail to provide fine-grained resource guarantees for latency-sensitive tasks that are GPU-bound. Specifically for GPUs, this results in resource fragmentation and interference leading to poor utilization of allocated GPU resources. Furthermore, GPUs exhibit highly linear energy efficiency with respect to utilization and hence proactive management of these resources is essential to keep the operational costs low while ensuring the end-to-end Quality of Service (QoS) in case of user-facing queries.Towards addressing the GPU orchestration problem, we build Knots, a GPU-aware resource orchestration layer and integrate it with the Kubernetes container orchestrator to build Kube- Knots. Kube-Knots can dynamically harvest spare compute cycles through dynamic container orchestration enabling co-location of latency-critical and batch workloads together while improving the overall resource utilization. We design and evaluate two GPU-based scheduling techniques to schedule datacenter-scale workloads through Kube-Knots on a ten node GPU cluster. Our proposed Correlation Based Prediction (CBP) and Peak Prediction (PP) schemes together improves both average and 99 th percentile cluster-wide GPU utilization by up to 80% in case of HPC workloads. In addition, CBP+PP improves the average job completion times (JCT) of deep learning workloads by up to 36% when compared to state-of-the-art schedulers. This leads to 33% cluster-wide energy savings on an average for three different workloads compared to state-of-the-art GPU-agnostic schedulers. Further, the proposed PP scheduler guarantees the end-to-end QoS for latency-critical queries by reducing QoS violations by up to 53% when compared to state-of-the-art GPU schedulers.
更多
查看译文
关键词
end-to-end QoS,latency-critical queries,Kube-Knots,dynamic container orchestration,GPU-based datacenters,compute heterogeneity,modern datacenters,resource utilization footprints,datacenter schedulers,fine-grained resource guarantees,latency-sensitive tasks,GPU-bound,resource fragmentation,resource interference,hence proactive management,GPU orchestration problem,GPU-aware resource orchestration layer,Kubernetes container orchestrator,spare compute cycles,latency-critical batch workloads,datacenter-scale workloads,node GPU cluster,percentile cluster-wide GPU utilization,PP scheduler,GPU schedulers,quality of service,user-facing queries,FPGAs,GPU-agnostic schedulers,cluster-wide energy savings,correlation based prediction,CBP,peak prediction,PP schemes,HPC workloads,job completion times,JCT,deep learning workloads,linear energy efficiency
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要