Maximizing Application Performance in a Multi-core, NUMA-Aware Compute Cluster by Multi-level Tuning.

Lecture Notes in Computer Science(2013)

引用 10|浏览68
暂无评分
摘要
Achieving good application performance on a modern compute cluster of multi-core, multi-socket, NUMA-aware systems can be challenging. In this paper, we use VASP, a popular ab-initio quantum-mechanical MD simulation software, to investigate the various levels of the software, hardware, and network tuning that boosts performance on a Dell PowerEdge R815 HPC cluster with AMD "Interlagos" and "Abu-Dhabi" processors. We implement code changes with the free software stack that supports FMA and AVX CPU instructions on the Bulldozer/Piledriver architecture. We analyze the MPI communications by profiling, compare the scalability performance of different interconnects, and discuss various MPI tuning parameters show effects of the advanced features that are crucial to the scalability performance of InfiniBand, including MXM and SRQ, which optimize the network resources for MPI communications. We investigate the importance of the MPI process placement, and introduce a process allocation tool that facilitates the affinity grouping on a multicore architecture.
更多
查看译文
关键词
Performance,Multi-Level Tuning,VASP,AMD Bulldozer,InfiniBand,MPI
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要