Fault-Tolerant LOBPCG for Nuclear CI Calculations.
HPC Asia(2023)
摘要
Exascale computing platforms with millions of compute units and with thousands of nodes are predicted to experience frequent faults which interrupt applications’ execution. In this context resilience against faults becomes important. We examine user and software level fault mitigation strategies in a distributed LOBPCG algorithm targeting nuclear CI calculations. In particular, we present and evaluate one strategy that keeps the total number of fault-tolerant LOBPCG iterations close to that of the standard LOBPCG algorithm ran on a fault-free machine.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要