Improving Network Availability with Protective ReRoute

David Wetherall,Abdul Kabbani,Van Jacobson, Jim Winget,Yuchung Cheng, Charles B. Morrey, Uma Moravapalle,Phillipa Gill, Steven Knight,Amin Vahdat

PROCEEDINGS OF THE 2023 ACM SIGCOMM 2023 CONFERENCE, SIGCOMM 2023(2023)

引用 0|浏览35
暂无评分
摘要
We present PRR (Protective ReRoute), a transport technique for shortening user-visible outages that complements routing repair. It can be added to any transport to provide benefits in multipath networks. PRR responds to flow connectivity failure signals, e.g., retransmission timeouts, by changing the FlowLabel on packets of the flow, which causes switches and hosts to choose a different network path that may avoid the outage. To enable it, we shifted our IPv6 network architecture to use the FlowLabel, so that hosts can change the paths of their flows without application involvement. PRR is deployed fleetwide at Google for TCP and Pony Express, where it has been protecting all production traffic for several years. It is also available to our Cloud customers. We find it highly effective for real outages. In a measurement study on our network backbones, adding PRR reduced the cumulative region-pair outage time for RPC traffic by 63-84%. This is the equivalent of adding 0.4-0.8 "nines" of availability.
更多
查看译文
关键词
Network availability,Multipathing,FlowLabel
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要