HKS: Efficient Data Partitioning for Stateful Streaming.

Adeel Aslam,Giovanni Simonini,Luca Gagliardelli,Angelo Mozzillo,Sonia Bergamaschi

DaWaK（2023）

引用 0|浏览1

暂无评分

摘要

Data partitioning among processing instances of distributed stream processing systems (DSPSs) plays a significant role in the performance of overall stream processing. Several data partitioning schemes, including round-robin and hash-based key-splitting strategies, are employed in this context. However, stateful operations introduce challenges such as data aggregation overhead and load imbalance among processing instances due to the skewed nature of real data. In this paper, we propose a partitioning strategy (HKS) that considers the popularity of the tuples on the fly and partitions them according to their frequency: higher frequent tuples are routed by employing power-of-the-two-choices, whereas low ones by using a single hash function. We perform a comprehensive experimental evaluation on synthetic and real-world data sets on well-known Apache Storm DSPS. Results demonstrate the superior performance of the HKS against state-of-the-art data partitioning schemes in terms of load imbalance and aggregation cost.

查看译文

关键词

stateful streaming,efficient data partitioning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要