A Hybrid Approach for Detecting Bugs in HPC Workloads.

Sonam Sherpa,Xinghui Zhao

International Conference on Parallel and Distributed Systems(2023)

引用 0|浏览0
暂无评分
摘要
MPI programs represent a major type of workloads running on parallel and distributed systems: tightly coupled high performance computing (HPC) workloads which use MPI to communicate between processes and instances. One of the major challenges for MPI programs is bug detection. Traditional approaches for diagnosing MPI bugs attempt to reproduce the exact execution schedule which reveals the bug, resulting in high run-time overhead. In this paper, we present our work in identifying bugs in MPI programs using a hybrid approach that leverages both static and dynamic analysis, and detects bugs at both compile time and run time. The static approach detects buggy patterns by analyzing the Intermediate Representation (IR) file generated using the LLVM compiler. The dynamic approach takes user defined rules, verifies them at runtime, and detects any violations which could be caused by a bug. To evaluate our approach, we have carried out experiments to detect various bugs in different benchmarks such as CombBLAS, OpenFFT, and NAS Parallel benchmarks. Our results show that the hybrid approach is effective in detecting bugs at both compile time and runtime.
更多
查看译文
关键词
High Performance Computing,Message Passing Interface,Static Analysis,Runtime,Bug Detection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要