Primary Data Deduplication - Large Scale Study and System Design.

Ahmed El-Shimi, Ran Kalach, Ankit Kumar,Adi Ottean,Jin Li,Sudipta Sengupta

USENIX ATC'12: Proceedings of the 2012 USENIX conference on Annual Technical Conference（2012）

引用 236|浏览29

暂无评分

摘要

We present a large scale study of primary data deduplication and use the findings to drive the design of a new primary data deduplication system implemented in the Windows Server 2012 operating system. File data was analyzed from 15 globally distributed file servers hosting data for over 2000 users in a large multinational corporation. The findings are used to arrive at a chunking and compression approach which maximizes deduplication savings while minimizing the generated metadata and producing a uniform chunk size distribution. Scaling of deduplication processing with data size is achieved using a RAM frugal chunk hash index and data partitioning - so that memory, CPU, and disk seek resources remain available to fulfill the primary workload of serving IO. We present the architecture of a new primary data deduplication system and evaluate the deduplication performance and chunking aspects of the system.

查看译文

关键词

new primary data,deduplication system,data size,primary data deduplication,deduplication performance,deduplication processing,deduplication saving,primary workload,RAM frugal chunk hash,chunking aspect,deduplication-large scale study,system design

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要