Context-aware Resemblance Detection based Deduplication Ratio Prediction for Cloud Storage

2022 IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT)(2022)

引用 0|浏览13
暂无评分
摘要
With the prevalence of cloud storage, people prefer to outsource their data to the cloud for flexibility and reliability. Undoubtedly, there are lots of redundancy among these data. However, high-end storage with deduplication costs heavy computation and increases the data management complexity. Potential customers need the redundancy proportion information of their outsourced data to decide whether high-end storage with deduplication is worthwhile. Thus, many researchers have previously attempted to predict the redundant ratio. However, existing mechanisms ignore the redundancy proportion among similar chunks containing many duplicate data. Although resemblance detection, detecting the duplicate parts among similar data, has become a hot issue, it is hardly applied to the conventional deduplication ratio estimation because of unacceptable calculation cost. Therefore, we analyze the limitations and challenges of deduplication ratio prediction in prediction scope and response time and further propose a novel prediction scheme. By leveraging the context-aware resemblance detection, and confidence interval theory, our method can achieve faster estimation speed with higher accuracy in deduplication ratio compared with the state-of-the-art work. Finally, the results show that our method can efficiently and effectively estimate the proportion of duplicate chunks and redundant data among similar chunks by conducting experiments on real workloads.
更多
查看译文
关键词
Cloud Storage,Resemblance Detection,Context-Aware,Deduplication Ratio Prediction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要