On-line failure prediction in safety-critical systems

Future Generation Computer Systems(2015)

引用 53|浏览67
暂无评分
摘要
In safety-critical systems such as Air Traffic Control system, SCADA systems, Railways Control Systems, there has been a rapid transition from monolithic systems to highly modular ones, using off-the-shelf hardware and software applications possibly developed by different manufactures. This shift increased the probability that a fault occurring in an application propagates to others with the risk of a failure of the entire safety-critical system. This calls for new tools for the on-line detection of anomalous behaviors of the system, predicting thus a system failure before it happens, allowing the deployment of appropriate mitigation policies.The paper proposes a novel architecture, namely CASPER, for online failure prediction that has the distinctive features to be (i) black-box: no knowledge of applications internals and logic of the system is required (ii) non-intrusive: no status information of the components is used such as CPU or memory usage; The architecture has been implemented to predict failures in a real Air Traffic Control System. CASPER exhibits high degree of accuracy in predicting failures with low false positive rate. The experimental validation shows how operators are provided with predictions issued a few hundred of seconds before the occurrence of the failure. Non-intrusive and black box effective online failure prediction.We monitor network traffic, only, to perform online failure prediction.Application agnostic: no knowledge of application logic is required.We use complex event processing to produce a representation of the system state.We use hidden Markov models in order to create a state recognizer.
更多
查看译文
关键词
Failure prediction,Complex event processing,Machine learning,Complex distributed systems,Critical infrastructures
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要