Efficient Parallel Skyline Evaluation Using MapReduce.
IEEE Trans. Parallel Distrib. Syst.(2016)
摘要
This research develops an advanced two-phase MapReduce solution that is able to efficiently address skyline queries on large datasets. Unlike existing parallel skyline approaches, our scheme considers data partitioning, filtering, and parallel skyline evaluation as a holistic query process. In particular, we apply filtering techniques and angle-based partitioning in the first phase, in which unqualified objects are discarded and the processed objects are partitioned by their angles to the origin.In the second phase, local skyline objects in each partition are calculated in parallel, and global skyline objects are output after a merging skyline process. To improve the parallel local skyline calculation, we propose two partition-aware filtering methods that keep skyline candidates in a balanced manner. The aggressive partition-aware filtering aggressively eliminates objects in the partition with the greatest population of candidate objects, whereas the proportional partition-aware filtering slows down the growth of partition population proportionally.Recognizing the lack of studies that incorporate the MapReduce framework into parallel skyline processing, we propose a partial-presort grid-based partition skyline algorithm that is able to significantly improve the merging skyline computation on large datasets. The presort process can be completed in the shuffle phase with little overhead. Our experimental results show the efficiency and effectiveness of the proposed parallel skyline solution utilizing MapReduce on large-scale datasets.
更多查看译文
关键词
Partitioning algorithms,Indexes,Algorithm design and analysis,Merging,Query processing,Complexity theory,Peer-to-peer computing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络