Efficient Process Replication for MPI Applications: Sharing Work between Replicas

International Parallel & Distributed Processing Symposium(2015)

引用 9|浏览13
暂无评分
摘要
With the increased failure rate expected in future extreme scale supercomputers, process replication might become a viable alternative to check pointing. By default, the workload efficiency of replication is limited to 50% because of the additional resources that have to be used to execute the replicas of the application's processes. In this paper, we introduce intra-parallelization, a solution that avoids replicating all computation by introducing work-sharing between replicas. We show on a representative set of benchmarks that intra-parallelization allows achieving more than 50% efficiency without compromising fault tolerance.
更多
查看译文
关键词
High performance computing, fault tolerance, replication
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要