DARC: High-dimensional Diffusing Anomaly Detection and Root Cause Location in Cloud Computing Systems.

2023 IEEE International Conference on Big Data (BigData)(2023)

引用 0|浏览0
暂无评分
摘要
The modern cloud computing system has evolved into a highly dynamic and complex ecosystem with thousands of modules. Modifications and updates to these modules occur every day to accommodate customer needs. Unfortunately, these frequent changes may introduce anomalies to the system, whose diffusion can undermine the system performance and even cause system outage. Though very important, it is challenging to detect anomalies at the early stages and locate their root causes, due to the complexity of the cloud ecosystem and the huge number of attribute combinations. This paper proposes DARC for high-dimensional diffusing anomaly detection and root cause location in cloud computing systems. DARC uses first two-stage percentile analysis and Mann-Kendall score thresholding to detect rare anomalies, and then a bottom-up search strategy with three computational complexity reduction techniques to efficiently locate the root causes. Extensive experiments showed that DARC is able to accurately and efficiently locate root causes of diffusing anomalies. It has been successfully used in the daily practice of Alibaba Cloud, one of the world’s largest cloud computing service providers.
更多
查看译文
关键词
high-dimensional diffusing anomaly detection,root cause location,cloud computing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要