SAFER: System-level Architecture for Failure Evasion in Real-time Applications

Real-Time Systems Symposium(2012)

引用 40|浏览0
暂无评分
摘要
Recent trends towards increasing complexity in distributed embedded real-time systems pose challenges in designing and implementing a reliable system such as a self-driving car. The conventional way of improving reliability is to use redundant hardware to replicate the whole (sub)system. Although hardware replication has been widely deployed in hard real-time systems such as avionics, space shuttles and nuclear power plants, it is significantly less attractive to many applications because the amount of necessary hardware multiplies as the size of the system increases. The growing needs of flexible system design are also not consistent with hardware replication techniques. To address the needs of dependability through redundancy operating in real-time, we propose a layer called SAFER(System-level Architecture for Failure Evasion in Real-time applications) to incorporate configurable task-level fault-tolerance features to tolerate fail-stop processor and task failures for distributed embedded real-time systems. To detect such failures, SAFER monitors the health status and state information of each task and broadcasts the information. When a failure is detected using either time-based failure detection or event-based failure detection, SAFER reconfigures the system to retain the functionality of the whole system. We provide a formal analysis of the worst-case timing behaviors of SAFER features. We also describe the modeling of a system equipped with SAFER to analyze timing characteristics through a model-based design tool called SysWeaver. SAFER has been implemented on Ubuntu 10.04 LTS and deployed on Boss, an award-winning autonomous vehicle developed at Carnegie Mellon University. We show various measurements using simulation scenarios used during the 2007 DARPA Urban Challenge. Finally, we present a case study of failure recovery by SAFER when node failures are injected.
更多
查看译文
关键词
embedded,task-level replication,time-based failure detection,task failures,hardware replication,safer,distributed embedded real-time systems,configurable task-level fault-tolerance features,flexible system design,reliable system,event-based failure detection,mobile robots,vehicles,ubuntu 10.04 lts,boss,failure evasion,award-winning autonomous vehicle,distributed,embedded real-time system,real-time applications,system increase,cold standby,failure recovery,software fault tolerance,system-level architecture,whole system,hard real-time systems,fail-stop processor,carnegie mellon university,hard real-time system,real-time,hardware replication techniques,embedded systems,fault-tolerance,distributed processing,safer feature,hot standby
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要