Performance Characterization Of Network Mechanisms For Non-Contiguous Data Transfers In Mpi

2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2020)(2020)

引用 3|浏览27
暂无评分
摘要
Message Passing Interface (MPI) is a very popular parallel programming model for developing parallel scientific applications. The complexity of data handled by scientific applications often results in their placement in non-contiguous locations in memory. In order to handle such complex and non-contiguous data, domain scientists often use user-defined datatypes that are supported by the MPI standard through Derived Data Types (DDT). Traditionally, popular implementations of the MPI standard have used simple schemes to "pack" and "unpack" non-contiguous data to and from contiguous memory regions before and after communication operations. On the other hand, vendors of high-performance interconnects have introduced several hardware offloaded schemes to perform optimized transfers of non-contiguous data. Although researchers have attempted to characterize the performance of non-contiguous transfers in the past, they have not gone in depth into the communication runtime to see where the bottlenecks lie, especially in the presence of network offloaded support. In this paper, we take up this challenge and evaluate different designs for non-contiguous data transfers on a particular MPI run-time using our synthetic benchmarks. We consider the following designs: 1) Pack-Unpack based RDMA transfer 2) User mode Memory Registration (UMR) based RDMA transfer. 3) Pipelined transfer. 4) SGL based transfer. For each of these designs, we measure the impact of a) serialization, b) memory registration, c) Packing, and d) additional overheads to send. From these evaluations, we realize why MPI run-times may not meet the expectations of DDT, and when to use DDT based implementations.
更多
查看译文
关键词
MPI, Derived Datatypes, Network Offload, Performance Characterization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要