From RDMA to RDCA: Toward High-Speed Last Mile of Data Center Networks Using Remote Direct Cache Access

arxiv(2023)

引用 0|浏览72
暂无评分
摘要
In this paper, we conduct systematic measurement studies to show that the high memory bandwidth consumption of modern distributed applications can lead to a significant drop of network throughput and a large increase of tail latency in high-speed RDMA networks.We identify its root cause as the high contention of memory bandwidth between application processes and network processes. This contention leads to frequent packet drops at the NIC of receiving hosts, which triggers the congestion control mechanism of the network and eventually results in network performance degradation. To tackle this problem, we make a key observation that given the distributed storage service, the vast majority of data it receives from the network will be eventually written to high-speed storage media (e.g., SSD) by CPU. As such, we propose to bypass host memory when processing received data to completely circumvent this performance bottleneck. In particular, we design Lamda, a novel receiver cache processing system that consumes a small amount of CPU cache to process received data from the network at line rate. We implement a prototype of Lamda and evaluate its performance extensively in a Clos-based testbed. Results show that for distributed storage applications, Lamda improves network throughput by 4.7% with zero memory bandwidth consumption on storage nodes, and improves network throughput by up 17% and 45% for large block size and small size under the memory bandwidth pressure, respectively. Lamda can also be applied to latency-sensitive HPC applications, which reduces their communication latency by 35.1%.
更多
查看译文
关键词
data center networks
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要