Learning A Hierarchical Monitoring System For Detecting And Diagnosing Service Issues

KDD '15: The 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Sydney NSW Australia August, 2015(2015)

引用 55|浏览204
暂无评分
摘要
We propose a machine learning based framework for building a hierarchical monitoring system to detect and diagnose service issues. We demonstrate its use for building a monitoring system for a distributed data storage and computing service consisting of tens of thousands of machines. Our solution has been deployed in production as an end-to-end system, starting from telemetry data collection from individual machines, to a visualization tool for service operators to examine the detection outputs. Evaluation results are presented on detecting 19 customer impacting issues in the past three months.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要