HRCA: A Heterogeneous Graph-based Adaptive Root Cause Analysis Framework

2023 IEEE 34th International Symposium on Software Reliability Engineering Workshops (ISSREW)(2023)

引用 0|浏览0
暂无评分
摘要
The paper introduces HRCA, a Heterogeneous graph-based Root Cause Analysis framework for large-scale cloud platforms. As cloud platforms expand rapidly, ensuring stability and reliability becomes increasingly important. However, the dynamic and complex call relationships among services, along with the massive infrastructure components, pose challenges to Root Cause Analysis (RCA). HRCA addresses these challenges by providing an adaptive root cause analysis plan that integrates data from multiple sources and employs unified heterogeneous graph modeling. The framework leverages a subgraph extracting module to improve efficiency and accuracy, as well as a supervised random walk algorithm for diagnosing root causes. Comparative evaluations with MicroRCA, AutoMAP, MicroDiag and Groot demonstrate that HRCA outperforms these state-of-the-art methods in terms of accuracy and generalization ability. Currently, HRCA is actively deployed on Huawei Cloud Stack platform for root cause analysis in production environments.
更多
查看译文
关键词
cloud platforms, root cause analysis, AIOps
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要