Cloud Service Failure Prediction on Google’s Borg Cluster Traces Using Traditional Machine Learning

Adrian-Ioan Tuns,Adrian Spătaru

2023 25th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)(2023)

引用 0|浏览0
暂无评分
摘要
The ability to predict failures in complex systems is crucial for maintaining their optimal performance, opening the possibility of reducing downtime and minimizing costs. In the context of cloud computing, cloud failure represents one of the most relevant problems, which not only leads to substantial financial losses but also negatively impacts the productivity of both industrial and end users. This paper presents a comprehensive study on the application of failure prediction techniques, by exploring four machine learning algorithms, namely Decision Tree, Random Forest, Gradient Boosting, and Logistic Regression.The research focuses on analyzing the workload of an industrial set of clusters, provided as traces in Google’s Borg cluster workload traces. The aim was to develop highly accurate predictive models for both job and task failures, a goal which was achieved. A job classifier having a performance of $83.97 \%$ accuracy (Gradient Boosting) and a task classifier of $\mathbf{9 8 . 7 9 \%}$ accuracy performance (Decision Tree) were obtained.
更多
查看译文
关键词
failure prediction,big data,machine learning,classification algorithms,Google Borg
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要