Generic and Robust Localization of Multi-dimensional Root Causes

2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)(2019)

引用 23|浏览42
暂无评分
摘要
Operators of online software services periodically collect various measures with many attributes. When a measure becomes abnormal, indicating service problems such as reliability degrade, operators would like to rapidly and accurately localize the root cause attribute combinations within a huge multi-dimensional search space. Unfortunately, previous approaches are not generic or robust in that they all suffer from impractical root cause assumptions, handling only directly collected measures but not derived ones, handling only anomalies with signicant magnitudes but not those insignicant but important ones, requiring manual parameter ne-tuning, or being too slow. This paper proposes a generic and robust multi-dimensional root cause localization approach, Squeeze, that overcomes all above limitations, the first in the literature. Through our novel bottom-up then top-down searching strategy and the techniques based on our proposed generalized ripple effect and generalized potential score, Squeeze is able to reach a good trade off between search speed and accuracy in a generic and robust manner. Case studies in several banks and an Internet company show that Squeeze can localize root causes much more rapidly and accurately than the traditional manual analysis. Furthermore, our extensive experiments on semi-synthetic datasets show that the F1-score of Squeeze outperforms previous approaches by 0.4 on average, while its localization time is only about 10 seconds.
更多
查看译文
关键词
multi-dimensional,root cause localization,generalized ripple effect,generalized potential score,bottom up and top down
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要