MARS: Fault Localization in Programmable Networking Systems with Low-cost In-Band Network Telemetry

PROCEEDINGS OF THE 52ND INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2023(2023)

引用 0|浏览5
暂无评分
摘要
Recently, the adoption of Software Defined Networking (SDN) as a network infrastructure has gained significant popularity. Although the openness and programmability of SDN ease the construction of large complex networks, it is still challenging to diagnose faults in a complex datacenter-scale network, which is crucial to guarantee rigorous service level agreement (SLA) of upper-layer applications. Previous network diagnosis tools incur significant overhead in fine-grained telemetry, and usually lack the ability to automatically diagnose fine-grained faults. Although on-demand monitoring methods is proposed to reduce telemetry overhead, they struggle to effectively set static thresholds, which requires expert experience. In this paper, we present MARS, a lightweight system for anomaly detection with dynamic threshold and automatic root cause localization in programmable networking systems. MARS collects aggregated packet-level telemetry on demand and generates a ranked list of fine-grained fault culprits at multiple levels, including port-level, switch-level, and flow-level. Experimental evaluations show the cost-effectiveness of MARS, both in terms of network bandwidth and switch memory usage. Moreover, MARS achieves a 0.97 F1 score in anomaly detection, and 0.95 Recall at Top-2 and an overall 0.3 Exam Score in root cause localization.
更多
查看译文
关键词
P4,In-band Network Telemetry,Fault Localization,Software Defined Network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要