PART: Pinning Avoidance in RDMA Technologies

2020 14th IEEE/ACM International Symposium on Networks-on-Chip (NOCS)(2020)

引用 1|浏览52
暂无评分
摘要
State-of-the-art Remote Direct Memory Access (RDMA) engines pin communication buffers, complicating the programming model, limiting the memory utilization, and mandating a separate memory translation subsystem spanning the network interface card and the OS. In this paper, we introduce PART, a page fault handling mechanism suitable for emerging nodes that integrate the NI with the main processor. PART does not need to pin pages, thus any process buffer can be used for communication, and resolves occasional page-faults dynamically, when the network accesses the memory, by reusing the RDMA transport. Additionally, PART leverages the I/O Memory Management Unit (IOMMU) which is next to the processor in order to translate virtual to physical addresses, thus reducing cost and complexity. We implement and evaluate PART in a cluster of 16 nodes and 64 ARM cores. We evaluate the performance of transfers for varying page fault frequency, and examine optimizations that proactively page-in all pages upon the first page fault or ahead of the transfer, providing useful insights that can be used to optimize runtimes. Our results show that PART completes one-page transfers with a minor page-fault at the destination in approximately 38 μsecs, while the slowdown on 1MB transfers that experience faults in all pages is as little as 2.6x compared to the no-page-fault case. Page faults are expected to be rare in HPC setups: the performance of LAMMPS in our cluster is virtually unaffected when pages are handled dynamically using PART.
更多
查看译文
关键词
page faults,RDMA,IOMMU,MPI,low-power ARM processors
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要