Masking failures from application performance in data center networks with shareable backup.

Dingming Wu,Yiting Xia, Xiaoye Steven Sun,Xin Sunny Huang,Simbarashe Dzinamarira,T. S. Eugene Ng

SIGCOMM（2018）

引用 24|浏览60

暂无评分

摘要

Shareable backup is an economical and effective way to mask failures from application performance. A small number of backup switches are shared network-wide for repairing failures on demand so that the network quickly recovers to its full capacity without applications noticing the failures. This approach avoids complications and ineffectiveness of rerouting. We propose ShareBackup as a prototype architecture to realize this concept and present the detailed design. We implement ShareBackup on a hardware testbed. Its failure recovery takes merely 0.73ms, causing no disruption to routing; and it accelerates Spark and Tez jobs by up to 4.1X under failures. Large-scale simulations with real data center traffic and failure model show that ShareBackup reduces the percentage of job flows prolonged by failures from 47.2% to as little as 0.78%. In all our experiments, the results for ShareBackup have little difference from the no-failure case.

查看译文

关键词

Data Center Network, Failure Recovery, Circuit Switching

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要