Load Balancing With Multi-Level Signals for Lossless Datacenter Networks

IEEE/ACM Transactions on Networking(2024)

引用 0|浏览2
暂无评分
摘要
Various datacenter network (DCN) load balancing schemes have been proposed in the past decade. Unfortunately, most of these solutions designed for lossy DCNs do not work well for Priority Flow Control (PFC) enabled lossless DCNs, primarily due to the reason that the individual congestion signals used in these solutions, e.g., link load, queue length, Round Trip Time (RTT) and Explicit Congestion Notification (ECN), may not be able to correctly or timely reflect the hop-by-hop PFC pausing. This paper first reveals the above problems via extensive experiments, and then based on the insights learned, we present Proteus, a PFC-aware load balancing scheme that is resilient to PFC pausing by exploring a combination of multi-level congestion signals. At its heart, Proteus leverages RTT-level signals (i.e., RTT and link utilization) to detect path status for initial routing decision, and exploits sub-RTT level signal (i.e., cumulative sojourn time) to reflect instantaneous PFC pausing and make timely rerouting choices based on the idea of better-late-than-never. We have implemented Proteus in the hardware programmable switch. Our testbed experiments as well as large-scale simulations show that Proteus can effectively handle PFC pausing under realistic workloads and achieve up to 35%, 31%, 28%, 22% and 46%, 42%, 34%, 29% better average FCT and $99^{th}$ percentile FCT than CONGA, DRILL, Hermes and MP-RDMA, respectively.
更多
查看译文
关键词
Datacenter,lossless networks,load balancing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要