Fault-Tolerant LOBPCG for Nuclear CI Calculations.

HPC Asia(2023)

引用 1|浏览10
暂无评分
摘要
Exascale computing platforms with millions of compute units and with thousands of nodes are predicted to experience frequent faults which interrupt applications’ execution. In this context resilience against faults becomes important. We examine user and software level fault mitigation strategies in a distributed LOBPCG algorithm targeting nuclear CI calculations. In particular, we present and evaluate one strategy that keeps the total number of fault-tolerant LOBPCG iterations close to that of the standard LOBPCG algorithm ran on a fault-free machine.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要