Fabsim-X: A Simulation Framework for the Analysis of Large-Scale Topologies and Congestion Control Protocols in Data Center Networks

Malek Musleh,Roberto Peñaranda,Allister Alemania,Pedro Yebenes Segura, Gene Wu, Jan Zielinski, Krzysztof Raszkowski, Nan Ni, Scott Diesing,Anupama Kurpad,Ram Huggahalli, Curt E. Bruns, Steven Miller, Sujoy Senxiv

2020 28th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)(2020)

引用 2|浏览5
暂无评分
摘要
The explosive growth in cloud-computing and overall data center system growth has created an unprecedented demand on system architects and designers to continuously develop more complex system networks to effectively satisfy the insatiable appetite to process, move, and store large amounts of data. Nonlinear system behavior caused by emerging workloads and use-cases, varying end-to-end congestion protocols, and heterogeneity in the various compute and storage capabilities of custom designed accelerators further compounds the design problem. Modern simulation methodologies lack a cohesive and efficient framework to address the interoperability of the intersecting layers at scale. In this paper, we present a simulation framework for evaluating congestion control protocols. Furthermore, we present a set of optimizations that enable analysis for longer simulated times and at network scales up to 128K nodes, which is vital for proper analysis of workloads that require long run times (e.g., AI training) or workloads that are known to have scaling issues (e.g., RDMA). Specifically, we evaluate congestion control performance at various scales, study the implications of topology scaling on congestion, and the performance impact of simultaneous heterogeneous protocols.
更多
查看译文
关键词
Performance Simulation,Networking,Congestion Control,Fat-Trees,TCP,iWARP,RoCEv2
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要