Online Failure Prediction Through Fault Injection and Machine Learning: Methodology and Case Study

2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE)(2023)

引用 0|浏览5
暂无评分
摘要
Online Failure Prediction (OFP) is a technique that attempts to predict incoming failures to mitigate their consequences. Machine Learning (ML) has been successfully used to create predictive models for OFP, but failures are rare, and thus failure data are typically not available. Fault injection has been accepted as a viable alternative to generate failure data. However, this raises several challenges, such as how to process the data to create and assess predictive models. The characteristics of fault injection campaigns (e.g., repeated/controlled experiments) and OFP (e.g., autocorrelation) require specific considerations. This work proposes a six-stage methodology for using failure data generated through fault injection to create accurate/representative models for OFP. It overviews the various phases, from generating and processing the data, to creating and assessing the performance of the models up to their deployment, while considering the intrinsic characteristics of the problem. As a case study, we apply the methodology to develop failure predictors for the Linux Operating System (OS). Results show that the proposed methodology led to accurate predictive models that could also generalize to failures that occur under different execution profiles, whilst using traditional techniques resulted in over-optimistic observations.
更多
查看译文
关键词
Dependability,Failure Prediction,Fault Injection,Machine Learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要