A Survey on Malleability Solutions for High-Performance Distributed Computing

APPLIED SCIENCES-BASEL(2022)

引用 4|浏览1
暂无评分
摘要
Maintaining a high rate of productivity, in terms of completed jobs per unit of time, in High-Performance Computing (HPC) facilities is a cornerstone in the next generation of exascale supercomputers. Process malleability is presented as a straightforward mechanism to address that issue. Nowadays, the vast majority of HPC facilities are intended for distributed-memory applications based on the Message Passing (MP) paradigm. For this reason, many efforts are based on the Message Passing Interface (MPI), the de facto standard programming model. Malleability aims to rescale executions on-the-fly, in other words, reconfigure the number and layout of processes in running applications. Process malleability involves resources reallocation within the HPC system, handling processes of the application, and redistributing data among those processes to resume the execution. This manuscript compiles how different frameworks address process malleability, their main features, their integration in resource management systems, and how they may be used in user codes. This paper is a detailed state-of-the-art devised as an entry point for researchers who are interested in process malleability.
更多
查看译文
关键词
exascale, job reconfiguration, MPI, data redistribution, resource management, adaptive workloads
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要