A global-state-triggered fault injector for distributed system evaluation

IEEE Transactions on Parallel and Distributed Systems(2004)

引用 65|浏览1
暂无评分
摘要
Validation of the dependability of distributed systems via fault injection is gaining importance because distributed systems are being increasingly used in environments with high dependability requirements. The fact that distributed systems can fail in subtle ways that depend on the state of multiple parts of the system suggests that a global-state-based fault injection mechanism should be used to validate them. However, global-state-based fault injection is challenging since it is very difficult in practice to maintain the global state of a distributed system at runtime with minimal intrusion into the system execution. We present Loki, a global-state-based fault injector, which has been designed with the goals of low intrusion, high precision, and high flexibility. Loki achieves these goals by utilizing the ideas of partial view of global state, optimistic synchronization, and offline analysis. In Loki, faults are injected based on a partial, view of the global state of the system, and a post-runtime analysis is performed to place events and injections into a single global timeline and to discard experiments with incorrect fault injections. Finally, the experiments with correct fault injections are used to estimate user-specified performance and dependability measures. A flexible measure language has been designed that facilitates the specification of a wide range of measures.
更多
查看译文
关键词
measure estimation.,correct fault injection,global state,partial view of global state,user-specified performance,fault tolerant computing,system evaluation,distributed system evaluation,loki,incorrect fault injection,partial view,system execution,65,system recovery,fault injection,single global timeline,post-runtime analysis,distributed systems,performance evaluation,reliable systems,global-state-based fault injection,global-state-based fault injector,global-state-triggered fault injector,global-state-based fault injection mechanism,distributed processing,synchronisation,offline clock synchronization,helium,availability,web server,system testing,air traffic control,synchronization,distributed system,clock synchronization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要