Effects and Benefits of Node Sharing Strategies in HPC Batch Systems

2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)(2019)

引用 5|浏览16
暂无评分
摘要
Processor manufacturers today scale performance by increasing the number of cores on each CPU. Unfortunately, not all HPC applications can efficiently saturate all cores of a single node, even if they successfully scale to thousands of nodes. For these applications, sharing nodes with other applications can help to stress different resources on the nodes to more efficiently use them. Previous work has shown that the performance impact of node sharing is very application dependent but very little work has studied its effects within batch systems and for complex parallel application mixes. Administrators therefore typically fear the complexity of running a batch system supporting node sharing and also fear that interference between co-allocated jobs in practice leads to worse performance. This paper focuses on sharing nodes by oversubscribing cores through hyper-threading. We introduce new node sharing strategies for batch systems by deriving extensions to the wellknown backfill and first fit algorithms. These strategies have been implemented in the SLURM workload manager and the evaluation is based on NERSC Trinity scientific mini applications. The evaluation of our node sharing strategies shows no overhead when using co-allocation, but an increased computational efficiency of 19% and an increased scheduling efficiency of 25.2% compared to standard node allocation.
更多
查看译文
关键词
node sharing,resource contention,co allocation,batch scheduling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要