Trading off power and fault-tolerance in real-time embedded systems

NASA/ESA Conference on Adaptive Hardware and Systems(2015)

引用 4|浏览7
暂无评分
摘要
Reliability and fault-tolerance are essential requirements of critical, autonomous computing systems. In this paper, we propose a methodology to quantify, and maximize, the reliability of computation in the presence of transient errors when considering the mapping of real-time tasks on an homogeneous multiprocessor system with voltage and frequency scaling capabilities. As the likelihood of transient errors due to radiation is environment- and component-specific, we use machine learning to estimate the actual fault-rate of the system. Furthermore, we leverage probability theory to define a trade-off between power consumption and fault-tolerance. If a processing element fails, our methodology is able to re-map the application, establishing whether the real-time requirements will still be met, and how reliable the new, impaired system will be. Results show that the proposed methodology is able to adjust mapping and operating frequencies in order to maintain a fixed level of reliability for different fault-rates.
更多
查看译文
关键词
fault-tolerance,real-time embedded systems,critical autonomous computing systems,transient errors,homogeneous multiprocessor system,voltage scaling capabilities,frequency scaling capabilities,machine learning,actual fault-rate estimation,probability theory,power consumption
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要