Advancing Network Monitoring and Operation with In-band Network Telemetry and Data Plane Programmability

NOMS(2023)

引用 0|浏览1
暂无评分
摘要
Modern communication networks operate under high expectations on performance and resilience (e.g., latency, bandwidth, availability) mainly due to the continuous proliferation of non-elastic highly-distributed applications. In this context, closely monitoring the state, behavior, and performance of networking devices and their traffic as well as quickly troubleshooting problems as they arise is essential for the operation of network infrastructures. Data Plane Programmability (DPP) along with In-band Network Telemetry (INT), backed by the recent advances in Software-Defined Networking, emerge in this context as promising platforms to meet these monitoring demands. In this thesis we make several contributions that advance the discipline of network monitoring and operation. We introduce and formalize the In-band Network Telemetry Orchestration (INTO) problem, which consists in assigning subsets of traffic to carry out INT and provide full monitoring coverage while minimizing the overhead. We prove this problem to be NP-Complete and propose polynomial computing time heuristic to solve it. In our evaluation using real wide-area network topologies, we observe that the heuristics produce solutions close to optimal to any network in under one second. Continuing our work, we investigate DPP capabilities further and design IntSight, a system for highly accurate and fine-grained detection and diagnosis of SLO violations. Our evaluation using real networks also shows that IntSight generates up to two orders of magnitude less monitoring traffic than state-of-the-art approaches. As a final step in this thesis, we shift our focus to quick reaction and propose Felix, a system for failure recovery that reroutes around failures at data-plane timescales while still using the shortest available paths. Our evaluation shows that our approach can recover from failures up to four orders of magnitude faster than existing SDN approaches while making sensible use of data-plane resources.
更多
查看译文
关键词
data plane programmability,data-plane resources,data-plane timescales,fine-grained detection,fine-grained diagnosis,highly-distributed applications,in-band network telemetry orchestration problem,INTO problem,magnitude less monitoring traffic,network operation,network infrastructures,network monitoring,NP-complete problem,polynomial computing time heuristic,SDN,software-defined networking,wide-area network topologies
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要