Efficient Intra-Node Communication On Intel-Mic Clusters

CCGRID '13: Proceedings of the 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing(2013)

引用 26|浏览46
暂无评分
摘要
Accelerators and coprocessors have become a key component in modern supercomputing systems due to the superior performance per watt that they offer. Intel's Xeon Phi coprocessor packs up to 1 TFLOP of double precision performance in a single chip while providing x86 compatibility and supporting popular programming models like MPI and OpenMP. This makes it an attractive choice for accelerating HPC applications. The Xeon Phi provides several channels for communication between MPI processes running on the coprocessor and the host. While supporting POSIX shared memory within the coprocessor, it exposes a low level API called the Symmetric Communication Interface (SCIF) that gives direct control of the DMA engine to the user. SCIF can also be used for communication between the coprocessor and the host. Xeon Phi also provides an implementation of the InfiniBand (IB) Verbs interface that enables a direct communication link with the InfiniBand adapter for communication between the coprocessor and the host. In this paper, we propose and evaluate design alternatives for efficient communication on a node with Xeon Phi coprocessor. We incorporate our designs in the popular MVAPICH2 MPI library. We use shared memory, IB Verbs and SCIF to design a hybrid solution that improves the MPI communication latency from Xeon Phi to the Host by 70%, for 4MByte messages, compared to an out-of-the-box version of MVAPICH2. Our solution delivers more than 6x improvement in peak uni-directional bandwidth from Xeon Phi to the Host and more than 3x improvement in bi-directional bandwidth. Through our designs, we are able to improve the performance of 16 process Gather, Alltoall and All gather collective operations by 70%, 85% and 80%, respectively, for 4MB messages. We further evaluate our designs using application benchmarks and show improvements of up to 18% with a 3D Stencil kernel and up to 11.5% with the P3DFFT library.
更多
查看译文
关键词
MPI,Xeon Phi,Intra-node Communication,SCIF,InfiniBand,Clusters
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要