A comprehensive repair scheme for distributed storage systems

Junmei Chen,Zongpeng Li,Guang Fang, Yeqiao Hou, Xianglong Li

COMPUTER NETWORKS(2023)

引用 0|浏览0
暂无评分
摘要
Modern data storage systems apply erasure codes to provide data reliability efficiently. Previous studies proposed a series of techniques to weigh repair/storage costs, reduce codec complexity, minimize repair time, improve fault tolerance, and enforce system-level service level agreement. These techniques have been designed in isolation, leading to performance limitations. We explore the potential advantages of combining these techniques to meet data storage systems' requirements better and provide superior system performance. This work proposes a comprehensive repair scheme for fault data in distributed storage systems. First, we tailor design erasure codes in the presence of heterogeneity of storage devices. The core idea is to monitor device performance (e.g., access speed, reliability), compute two coefficients for each device, and use them to select the appropriate devices to create stripes of erasure codes. Second, we leverage the system hierarchy to perform intermediary repair operations, further minimizing cross-rack repair bandwidth. Finally, we propose a new repair scheme adapted to the skew of data access. To demonstrate the effectiveness of our comprehensive repair scheme, we evaluate various erasure codes via mathematical analysis and experiments in the Ceph cluster. In the mise-en-scene of traditional re-encoding methods and more recent adaptive erasure codes, our scheme stands out with significant savings in recovery bandwidth, code-switching bandwidth, repair time, and code-switching time.
更多
查看译文
关键词
Distributed storage system,Erasure code,Data reliability,Heterogeneous,Cross-rack,Access skew
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要