Detecting skewness of big spatial data in SpatialHadoop.

SIGSPATIAL/GIS(2018)

引用 23|浏览26
暂无评分
摘要
In recent years several extensions of Hadoop system have been proposed for dealing with spatial data and SpatialHadoop belongs to this group. In the MapReduce paradigm a task can be parallelized by partitioning data into chunks and performing the same operation on them, eventually combining the partial results at the end. Thus, the applied partitioning technique can tremendously affect the performance of a parallel execution, since it is the key point for obtaining balanced map tasks. However, when skewed distributed datasets are considered, using a regular grid might not be the right choice and other techniques have to be applied, which in turn are more expensive to build. This paper illustrates an approach for detecting the degree of skewness of a spatial dataset, based on the box counting function. Moreover, given the degree of skewness and some experimental observations, a heuristic is sketched in order to decide which partitioning technique to apply in order to improve as much as possible the performance of subsequent operations.
更多
查看译文
关键词
SpatialHadoop, Skewed data, Partitioning, MapReduce, BigData
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要