Utilizing Multiple Xeon Phi Coprocessors on One Compute Node.
ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2014, PT II(2014)
摘要
Future exascale systems are expected to adopt compute nodes that incorporate many accelerators. This paper thus investigates the topic of programming multiple Xeon Phi coprocessors that lie inside one compute node. Besides a standard MPI-OpenMP programming approach, which belongs to the symmetric usage mode, two offload-mode programming approaches are considered. The first offload approach is conventional and uses compiler pragmas, whereas the second one is new and combines Intel's APIs of coprocessor offload infrastructure (COI) and symmetric communication interface (SCIF) for low-latency communication. While the pragma-based approach allows simpler programming, the COI-SCIF approach has three advantages in (1) lower overhead associated with launching offloaded code, (2) higher data transfer bandwidths, and (3) more advanced asynchrony between computation and data movement. The low-level COI-SCIF approach is also shown to have benefits over the MPI-OpenMP counterpart. All the programming approaches are tested by a real-world 3D application, for which the COI-SCIF approach shows a performance upper hand on a Tianhe-2 compute node with three Xeon Phi coprocessors.
更多查看译文
关键词
Data Transfer, Programming Approach, Code Block, Compute Node, Many Integrate Core
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络