MVAPICH2-MIC: A High Performance MPI Library for Xeon Phi Clusters with InfiniBand

XSW '13 Proceedings of the 2013 Extreme Scaling Workshop (xsw 2013)(2013)

引用 19|浏览28
暂无评分
摘要
Intel's Xeon Phi coprocessor, based on Many Integrated Core architecture, packs more than 1TFLOP of performance on a single chip and offers x86 compatibility. While MPI libraries can run out-of-the-box on the Xeon Phi coprocessors, it is critical to tune them for the new architecture and to redesign them using any new system level features offered in order to deliver performance. In this paper, we discuss the tuning and redesign of the MVAPICH2 MPI library for efficient intra-node and inter-node point-to-point communication on XeonPhi clusters with InfiniBand. We evaluate the designs using micro-benchmarks and application kernels. The results show significant improvements in performance of intra-MIC, intranode and internode communication. For the internode MIC-MIC path, the latency of 4M messages is reduced by 65% and the bandwidth for the same message size is improved by 5 times. The designs show 50% and 16% improvement in performance of 3DStencil communication kernel and P3DFFT library on 32 and 8 nodes, respectively. We discuss the challenges involved in providing a further optimized MVAPICH2 MPI library for Xeon Phi clusters.
更多
查看译文
关键词
coprocessors,message passing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要