Enabling Efficient Erasure Coding in Disaggregated Memory Systems

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS(2024)

引用 0|浏览8
暂无评分
摘要
Disaggregated memory (DM) separates compute and memory resources to build a huge memory pool. Erasure coding (EC) is expected to provide fault tolerance in DM with low memory cost. In DM with EC, objects are first coded in compute servers, then directly written to memory servers via high-speed networks like one-sided RDMA. However, as the one-sided RDMA latency goes down to the microsecond level, coding overhead degrades the performance in DM with EC. To enable efficient EC in DM, we thoroughly analyze the coding stack from the perspective of cache efficiency and RDMA transmission. We develop MicroEC, which optimizes the coding workflow by reusing the auxiliary coding data and coordinates the coding and RDMA transmission with an exponential pipeline, as well as carefully adjusting the coding and transmission threads to minimize the latency. We implement a prototype supporting common basic operations, such as write/read/degraded read/recovery. Experiments show that MicroEC reduces the write latency by up to 44.35% and 42.14% and achieves up to 1.80 x and 1.73 write throughput, compared with the state-of-the-art DM systems with EC and 3-way replication for objects not smaller than 1 MB, respectively. For small objects, MicroEC also evidently reduces the variation of latency, e.g., it reduces the P99 latency of writing 1 KB objects by 27.81%.
更多
查看译文
关键词
Encoding,Servers,Memory management,Random access memory,Fault tolerant systems,Fault tolerance,Throughput,Disaggregated memory,erasure coding,pipeline,reliability
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要