Ae: An Asymmetric Extremum Content Defined Chunking Algorithm For Fast And Bandwidth-Efficient Data Deduplication
2015 IEEE Conference on Computer Communications (INFOCOM)(2015)
摘要
Data deduplication, a space-efficient and bandwidth-saving technology, plays an important role in bandwidth-efficient data transmission in various data-intensive network and cloud applications. Rabin-based and MAXP-based Content-Defined Chunking (CDC) algorithms, while robust in finding suitable cut-points for chunk-level redundancy elimination, face the key challenges of (1) low chunking throughput that renders the chunking stage the deduplication performance bottleneck and (2) large chunk-size variance that decreases deduplication efficiency. To address these challenges, this paper proposes a new CDC algorithm called the Asymmetric Extremum (AE) algorithm. The main idea behind AE is based on the observation that the extreme value in an asymmetric local range is not likely to be replaced by a new extreme value in dealing with the boundaries-shift problem, which motivates AE's use of asymmetric (rather than symmetric as in MAXP) local range to identify cut-points and simultaneously achieve high chunking throughput and low chunk-size variance. As a result, AE simultaneously addresses the problems of low chunking throughput in MAXP and Rabin and high chunk-size variance in Rabin. The experimental results based on four real-world datasets show that AE improves the throughput performance of the state-of-the-art CDC algorithms by 3x while attaining comparable or higher deduplication efficiency.
更多查看译文
关键词
AE algorithms,asymmetric extremum content defined chunking algorithm,fast data deduplication,bandwidth-efficient data deduplication,bandwidth saving technology,bandwidth efficient data transmission,cloud applications,content defined chunking algorithms,CDC algorithm,asymmetric extremum algorithm
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络