Robust Failure Diagnosis of Microservice System Through Multimodal Data

Shenglin Zhang,Pengxiang Jin, Zihan Lin,Yongqian Sun, Bicheng Zhang,Sibo Xia,Zhengdan Li,Zhenyu Zhong,Minghua Ma, Wa Jin, Dai Zhang, Zhenyu Zhu,Dan Pei

IEEE TRANSACTIONS ON SERVICES COMPUTING(2023)

引用 2|浏览76
暂无评分
摘要
Automatic failure diagnosis is crucial for large microservice systems. Currently, most failure diagnosis methods rely solely on single-modal data (i.e., using either metrics, logs, or traces). In this study, we conduct an empirical study using real-world failure cases to show that combining these sources of data (multimodal data) leads to a more accurate diagnosis. However, effectively representing these data and addressing imbalanced failures remain challenging. To tackle these issues, we propose DiagFusion, a robust failure diagnosis approach that uses multimodal data. It leverages embedding techniques and data augmentation to represent the multimodal data of service instances, combines deployment data and traces to build a dependency graph, and uses a graph neural network to localize the root cause instance and determine the failure type. Our evaluations using real-world datasets show that DiagFusion outperforms existing methods in terms of root cause instance localization (improving by 20.9% to 368%) and failure type determination (improving by 11.0% to 169%).
更多
查看译文
关键词
Microservice systems,failure diagnosis,multimodal data,graph neural network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要