PLB: congestion signals are simple and effective for network load balancing

SIGCOMM '22: Proceedings of the ACM SIGCOMM 2022 Conference(2022)

引用 20|浏览65
暂无评分
摘要
We present a new, host-based design for link load balancing and report the first experiences of link imbalance in datacenters. Our design, PLB (Protective Load Balancing), builds on transport protocols and ECMP/WCMP to reduce network hotspots. PLB randomly changes the paths of connections that experience congestion, preferring to repath after idle periods to minimize packet reordering. It repaths a connection by changing the IPv6 Flow Label on its packets, which switches include as part of ECMP/WCMP. Across hosts, this action drives down hotspots in the network, and lowers the latency of RPCs. PLB is used fleetwide at Google for TCP and Pony Express traffic. We could deploy it when other designs were infeasible because PLB requires only small transport modifications and switch configuration changes, and is backwards-compatible. It has produced excellent gains: the median utilization imbalance of highly-loaded ToR uplinks in Google datacenters fell by 60%, packet drops correspondingly fell by 33%, and the tail latency (99p) of small RPCs fell by 20%. PLB is also a general solution that works for settings from datacenters to backbone networks, as well as different transports.
更多
查看译文
关键词
Congestion control, Datacenter fabric, Load balancing, Distributed
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要