NetEC: Accelerating Erasure Coding Reconstruction With In-Network Aggregation

IEEE Transactions on Parallel and Distributed Systems(2022)

引用 8|浏览24
暂无评分
摘要
In distributed storage systems, Erasure Coding (EC) is a crucial technology to enable high data availability. By downloading parity data from survived machines, EC can reconstruct lost data with much lower storage overheads than data replication. However, this reduction in storage cost comes at the expense of extra performance problems: low reconstruction rate , high degraded read latency , and high host CPU utilization . Our analysis shows that these performance problems are deeply rooted in the host-based EC processing. To resolve these problems, we present NetEC, an in-network accelerating framework that fully offloads EC to the new generation programmable switching ASICs. We propose Explicit Buffer Size Notification (EBSN) to constrain decoding buffer usage, and design an on-switch one-to-many TCP proxy to integrate EBSN with TCP. We also design two parallel Galois Field (GF) offloading methods—table lookup and bitmatrix methods—to maximize parsable bytes. We implement NetEC on programmable switches and integrate it with HDFS. Extensive evaluations show that NetEC improves the reconstruction rate by 2.7x-6.8x, reduces the degraded read latency significantly, and removes the host CPU overhead completely. We also emulate multi-rack scenarios and show that NetEC is able to support $\sim$ GB/s reconstruction rate and tens of concurrent tasks.
更多
查看译文
关键词
Erasure coding,distributed storage sytems,programmable switch,software-defined networks
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要