Efficient and Robust Trace Anomaly Detection for Large-Scale Microservice Systems

Shenglin Zhang,Zhongjie Pan, Heng Liu,Pengxiang Jin,Yongqian Sun, Qianyu Ouyang, Jiaju Wang, Xueying Jia,Yuzhi Zhang, Hui Yang, Yongqiang Zou,Dan Pei

2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE)(2023)

引用 0|浏览12
暂无评分
摘要
Microservice invocation anomalies can have a detrimental impact on user experience and service revenue. While existing trace anomaly detection approaches typically focus on anomalies in response time and invocation structure, they often overlook the importance of using fine-grained features to detect anomalies. Additionally, trace data obtained from real-world scenarios is typically accompanied by noise, which can hinder the effectiveness of anomaly detection approaches. Furthermore, large-scale trace data can significantly impact model training efficiency. To address these challenges, we propose TraceSieve, an unsupervised trace anomaly detection method that accurately detects trace anomalies. Our approach leverages an auto-encoder architecture within an adversarial training framework to filter out noise data. Additionally, we integrate VGAE-EWC, which combines Variational Graph Auto-Encoder (VGAE) with Elastic Weight Consolidation (EWC), to overcome the challenges of enormous time consumption during the training phase. Finally, we localize the root cause of trace anomalies. Our proposed method is evaluated using two different datasets, and our results demonstrate that TraceSieve achieves an F 1 -score of 0.970 and 0.925, respectively, outperforming state-of-the-art trace anomaly detection approaches.
更多
查看译文
关键词
microservice,trace,failure detection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要