Range Partitioning Within Sublinear Time: Algorithms And Lower Bounds

THEORETICAL COMPUTER SCIENCE(2021)

引用 0|浏览13
暂无评分
摘要
Range partitioning is a typical and mostly used data partitioning method and has became a core operation in most of big data computing platforms. Given an input Lof Ndata items admitting a total order, the goal of range partitioning is to divide the whole input into kranges containing the same number of data items. There is a trivial lower bound Omega(N) for the exact partitioning algorithms, since they need to at least make a full scan of the whole data. In the context of big data computing, even algorithms with O(N) time are not always thought to be efficient enough, the ultimate goal of designing algorithms on big data is usually to solve problems within sublineartime. Therefore, it is well motivated and important to study sublinear algorithms for the range partitioning problem.The paper aims to answer three questions. For the internal memory (RAM) model, since sophisticated sampling based (epsilon, delta)-approximation partitioning algorithm with O(klog(N/delta)/epsilon(2)) time cost has been proposed, the first question is what a lower bound we can obtain for sublinear partitioning algorithms. For the external memory (I/O) model, as far as we know, no previous works give external partitioning algorithms with performance guarantee within sublinear time, therefore the two questions are what the upper bound and the lower bound we can achieve for sublinear external partitioning algorithms. To answer the above questions, based on the RAM and I/O model, the paper studies the lower and upper bounds for the range partitioning problem. For the RAM model, alower bound Omega(k(1-delta)/epsilon(2)) for the cost of sampling based partitioning algorithms is proved. For the I/O model, two lower bounds of the sampling cost required by sublinear external range partitioning algorithms are proved, which indicate that at least a full scan of the whole input is needed in the worst case and a general sublinear external partitioning algorithm does not exist. Motivated by the hard instances utilized in the proof of lower bounds, amodel for describing the input distributions of the range partitioning problem in practical applications is proposed. Finally, for the special cases described by the model, asublinear external partitioning algorithm with O(klog(N/delta)/B epsilon(2)) I/O cost is designed. (c) 2021 Elsevier B.V. All rights reserved.
更多
查看译文
关键词
Range partitioning, Sublinear algorithm, Lower bounds, External memory model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要