Mining causality graph for automatic web-based service diagnosis

2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC)(2016)

引用 17|浏览51
暂无评分
摘要
It is crucial for Internet company to provide highly reliable web-based services. The web-based services always have many components running in the large-scale infrastructure with complex interactions. As an indispensable part of high reliability, the diagnosis remains to be a thorny problem. With the growth of system scale and complexity, it becomes even more difficult. In this paper, we propose an automatic diagnosis system based on causality graph to help system operators find the root causes. The causality graph is mainly extracted from the historical data of the monitoring system, and the method consists of four steps. 1) It utilizes a data mining method to extract the initial causality graph. 2) Once a failure happens, it lists top-k suspects with a ranking algorithm based on the causality graph. 3) Then system operators check the suspects and label them either right or wrong. 4) A supervised learning algorithm takes the labels as the input to tune the causality graph, in order to improve the diagnosis accuracy on step 2 iteratively. This method requires neither knowledge about the design and implementation details of the web-based service, nor instrumenting the services' source code. Our controlled experiments show that the root causes can be ranked in top 3 with 100% accuracy after countable learning iterations.
更多
查看译文
关键词
automatic Web-based service diagnosis,Internet company,large-scale infrastructure,historical data,monitoring system,causality graph mining,data mining,initial causality graph extraction,top-k suspects,ranking algorithm,supervised learning algorithm,service source code,countable learning iterations
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要