Boafft: Distributed Deduplication for Big Data Storage in the Cloud

Shengmei Luo,Guangyan Zhang,Chengwen Wu,Samee Ullah Khan,Keqin Li

IEEE Transactions on Cloud Computing（2020）

引用 58|浏览82

暂无评分

摘要

As data progressively grows within data centers, the cloud storage systems continuously facechallenges in saving storage capacity and providing capabilities necessary to move big data within an acceptable time frame. In this paper, we present the Boafft, a cloud storage system with distributed deduplication. The Boafft achieves scalable throughput and capacity usingmultiple data servers to deduplicate data in parallel, with a minimal loss of deduplication ratio. Firstly, the Boafft uses an efficient data routing algorithm based on data similarity that reduces the network overhead by quickly identifying the storage location. Secondly, the Boafft maintains an in-memory similarity indexing in each data server that helps avoid a large number of random disk reads and writes, which in turn accelerates local data deduplication. Thirdly, the Boafft constructs hot fingerprint cache in each data server based on access frequency, so as to improve the data deduplication ratio. Our comparative analysis with EMC's stateful routing algorithm reveals that the Boafft can provide a comparatively high deduplication ratio with a low network bandwidth overhead. Moreover, the Boafft makes better usage of the storage space, with higher read/write bandwidth and good load balance.

查看译文

关键词

Big data,cloud storage,data deduplication,data routing,file system

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要