AAsclepius: Monitoring, Diagnosing, and Detouring at the Internet Peering Edge.

Kaicheng Yang,Yuanpeng Li,Sheng Long,Tong Yang,Ruijie Miao,Yikai Zhao, Chaoyang Ji, Penghui Mi, Guodong Yang, Qiong Xie,Hao Wang, Yinhua Wang, Bo Deng, Zhiqiang Liao,Chengqiang Huang,Yongqiang Yang, Xiang Huang, Wei Sun, Xiaoping Zhu

USENIX Annual Technical Conference(2023)

引用 0|浏览14
Network faults occur frequently in the Internet. From the perspective of cloud service providers, network faults can be classified into three categories: cloud faults, client faults, and middle faults. This paper mainly focuses on middle faults. To minimize the harm of middle faults, we build a fully automatic system in Huawei Cloud, namely AAsclepius, which consists of a monitoring subsystem, a diagnosing subsystem, and a detouring subsystem. Through the collaboration of the three subsystems, AAsclepius monitors network faults, diagnoses network faults, and detours the traffic to circumvent middle faults at the Internet peering edge. The key innovation of AAsclepius is to identify the fault direction with a novel technique, namely PathDebugging. AAsclepius has been running for two years stable, protecting Huawei Cloud from major accidents in 2021 and 2022. Our evaluation on three major points of presence in December 2021 shows that AAsclepius can efficiently and safely detour the traffic to circumvent outbound faults within a few minutes.
AI 理解论文
Chat Paper