Rethread: A Low-Cost Transient Fault Recovery Scheme for Multithreaded Processors

ARES(2014)

引用 1|浏览35
暂无评分
摘要
Transient fault recovery is important in processor availability. However, significant silicon or performance over-heads are characteristics of existing techniques. We uncover an opportunity to reduce the overheads dramatically in modern processors that appears as a side-effect of introducing hardware multithreading to improve performance. We observe that threads are usually short code sequences with no branches and few memory side-effects, which means that the number of checkpoints is small and constant. In addition, the state structures of a thread already presented in hardware can be reused to provide check pointing. In this paper, we demonstrate this principle of using a hardware/software co-design called Rethread, which features compiler-generated code annotations and automatic recovery in hardware by restarting threads. This approach provides the ability to recover from transient faults without dedicated hardware. Moreover, results show performance degradation under both fault-free condition (less than 5%) and as a function of fault rate.
更多
查看译文
关键词
rethread,hardware multithreading,check pointing,checkpointing,short code sequences,low-cost transient fault recovery scheme,thread re-execution,fault tolerant computing,automatic recovery,multi-threading,performance degradation,silicon,fault recovery,modern processors,transient fault,fault-free condition,multithreading,multithreaded processors,fault recovery, transient fault, thread re-execution, multithreading,compiler-generated code annotations,state structures,program compilers,hardware,fault detection,instruction sets,bit error rate
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要