SRNIC: A Scalable Architecture for RDMA NICs.

Zilong Wang,Layong Luo,Qingsong Ning,Chaoliang Zeng,Wenxue Li,Xinchen Wan, Peng Xie, Tao Feng, Ke Cheng,Xiongfei Geng, Tianhao Wang, Weicheng Ling, Kejia Huo, Pingbo An, Kui Ji, Shideng Zhang, Bin Xu, Ruiqing Feng, Tao Ding,Kai Chen,Chuanxiong Guo

NSDI(2023)

引用 0|浏览55
暂无评分
摘要
RDMA is expected to be highly scalable: to perform well in large-scale data center networks where packet losses are inevitable (i.e., high network scalability), and to support a large number of performant connections per server (i.e., high connection scalability). Commercial RoCEv2 NICs (RNICs) fall short on scalability as they rely on a lossless, limited-scale network fabric and support only a small number of performant connections. Recent work IRN improves the network scalability by relaxing the lossless network requirement, but the connection scalability issue remains unaddressed. In this paper, we aim to address the connection scalability challenge, while maintaining high performance and low CPU overhead as commercial RNICs, and high network scalability as IRN, by designing SRNIC, a Scalable RDMA NIC architecture. Our key insight in SRNIC is that, on-chip data structures and their memory requirements in RNICs can be minimized with careful protocol and architecture co-designs to improve connection scalability. Guided by this insight, we analyze all data structures involved in an RDMA conceptual model, and remove them as many as possible with RDMA protocol header modifications and architectural innovations, including cache-free QP scheduler and memory-free selective repeat. We implement a fully functional SRNIC prototype using FPGA. Experiments show that, SRNIC achieves 10K performant connections on chip and outperforms commercial RNICs by 18x in terms of normalized connection scalability (i.e., the number of performant connections per 1MB memory), while achieving 97 Gbps throughput and 3.3 mu s latency with less than 5% CPU overhead, and maintaining high network scalability.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要