netCSI: A Generic Fault Diagnosis Algorithm for Large-Scale Failures in Computer Networks

SRDS '11 Proceedings of the 2011 IEEE 30th International Symposium on Reliable Distributed Systems(2016)

引用 12|浏览0
暂无评分
摘要
In this paper we present a framework and a set of algorithms for determining faults in networks when large scale outages occur. The design principles of our algorithm, netCSI, are motivated by the fact that failures are geographically clustered in such cases. We address the challenge of determining faults with incomplete symptom information due to a limited number of reporting nodes in the network. netCSI consists of two parts: hypotheses generation algorithm, and ranking algorithm. When constructing the hypotheses list of potential causes, we make novel use of the positive and negative symptoms to improve the precision of the results. The ranking algorithm is based on conditional failure probability models that account for the geographic correlation of the network objects in clustered failures. We evaluate the performance of netCSI for networks with both random and realistic topologies. We compare the performance of netCSI with an existing fault diagnosis algorithm, MAX-COVERAGE, and achieve an average gain of 128\% in accuracy for realistic topologies.
更多
查看译文
关键词
computer networks,generic fault diagnosis algorithm,geographic correlation,large-scale failures,design principle,average gain,conditional failure probability model,hypotheses generation algorithm,network object,realistic topology,ranking algorithm,existing fault diagnosis algorithm,hypotheses list,clustering algorithms,mathematical model,optimization,incomplete information,algorithm design and analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要