Optimizing data partition for scaling out NoSQL cluster

Concurrency and Computation: Practice and Experience(2015)

引用 11|浏览40
暂无评分
摘要
Data partition impacts the performance of Not Only SQL NoSQL systems significantly. Nowadays, many of the peer-to-peer NoSQL systems use consistent hashing to partition data automatically. These systems use virtual nodes and random data placement methods to divide the consistent hashing ring, which may lead to imbalanced data partition and degrade the overall system performance. The problem is prominent especially for scaling out heterogeneous clusters. Considering the capacity of each node, an imbalance coefficient of data distribution for a cluster is proposed firstly in this paper. Based on the imbalance coefficient, we propose a dynamic programming algorithm to calculate the position of the new coming node in the consistent hashing ring, which expands the consistent hashing ring more evenly without re-shuffling the entire datasets. Simulations and experiments on Cassandra with Yahoo! Cloud Serving Benchmark YCSB benchmark show our algorithm is better than the state-of-the-art work. Copyright © 2015 John Wiley & Sons, Ltd.
更多
查看译文
关键词
consistent hashing,data partition,NoSQL,heterogeneous nodes
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要