An efficient cloud-based elastic RDMA protocol for HPC applications

Hang Cao, Cheng Xu, Yunqi Han, Muhui Lin, Kai Shen, Geng Wang, Jinhu Li, Xiangzheng Sun, Ronghui He,Liang You,Hang Yang,Xiantao Zhang

CCF Transactions on High Performance Computing(2024)

引用 0|浏览3
暂无评分
摘要
High-performance computing (HPC) networking is of great importance in scaling many HPC applications across multiple nodes. Generally, most HPC applications deployed on traditional supercomputers or clusters adopt RDMA protocols such as InfiniBand for inter-node networking to mitigate high latency during constant communication. As cloud-based HPC continues to emerge as a significant trend, utilizing RDMA in the cloud has become a challenging problem. To address this problem, We propose an efficient elastic RDMA Protocol (eRDMA) to enabling RDMA’s merits for HPC applications in the cloud. eRDMA applys the direct data movement (DDM) of cloud infrastructure processing Unit (CIPU), overlay of virtual private cloud (VPC), and compatibility for RDMA verbs to fully utilize the elastic resources with the features of RDMA network for HPC scenarios in the cloud. The effectiveness of eRDMA is demonstrated by various experimental results across different platforms for many HPC and general TCP applications.
更多
查看译文
关键词
RDMA,HPC applications,Cloud computing,Elastic networking
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要