I/O Causality Based In-line Data Deduplication for Non-Volatile Memory Enabled Storage Systems

IEEE Transactions on Computers(2024)

引用 0|浏览4
暂无评分
摘要
Data deduplication technologies are widely exploited to reduce capacity demands for storage. Previous chunk-based offline deduplication technologies often cause serious performance overhead due to data chunking and indexing. Particularly, they are not efficient for non-volatile memory (NVM) based storage systems because they cannot fully exploit the byte-addressability feature of NVMs for fine-grained deduplication. In this paper, we propose I/O Causality based In-line Deduplication (ICID) to maximize the deduplication ratio for NVM-based storage systems. Unlike previous inline deduplication schemes that use hash indexes to identify duplicate data slices, ICID records memory-copy operations in a B-tree structure to achieve causality-based inline deduplication. We propose two novel techniques to manage memory-copy records in the B-tree efficiently. First, to speed up the B-tree lookup, we group memory-copy records targeted to the same page in a B-tree node to improve data locality. Second, we exploit the spatial locality of memory accesses to identify outdated memory-copy records, and delete them in time to reduce memory consumption of the B-tree. We evaluate ICID in a system equipped with Intel Optane DC Persistent Memory Modules. For a typical KV store–LevelDB, our experimental results show that ICID achieves up to 16higher deduplication ratio and reduces the time cost of data deduplication by 47% on average compared with state-of-the-art deduplication schemes.
更多
查看译文
关键词
Data Deduplication,I/O Causality,Non-Volatile Memory
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要