Network-aware worker placement for wide-area streaming analytics

Future Generation Computer Systems(2022)

引用 2|浏览11
暂无评分
摘要
Many organizations leverage Distributed Stream processing systems (DPSs) to get insights from the data generated by different users/devices, e.g., the Internet of Things (IoT) devices or user clicks on a website, on geographically distributed datacenters. The worker nodes in such environments are connected through Wide Area Network (WAN) links with various delays and bandwidth. Therefore, minimizing the execution latency of a task on the worker nodes while using the links with enough bandwidth and lower cost to steer the traffic of the applications is a challenging task. In this paper, we formulate the worker node placement for a geo-distributed DSPs network as a multi-criteria decision-making problem. Then, we propose an additive weighting-based approach to solve it. The users can prioritize the worker node placement according to the network-relevant parameters. We also propose a framework that can be integrated with the current DPSs to execute the tasks. We test our placement approach on three widely used stream processing systems, i.e., Apache Spark, Apache Storm, and Apache Flink, on three custom graphs adopted from the real cloud providers. We run the streaming query of the Yahoo! streaming benchmark on these three DPSs. The experimental results show that our approach improves the performance of Spark up to 2.2x–7.2x, Storm up to 1.2x–3.4x, and Flink up to 1.4x–3.3x compared with other placement approaches, which makes our framework useful for use in practical environments.
更多
查看译文
关键词
Internet of Things (IoT),Worker node placement,Wide-area stream analytics,Stream processing,Simple additive weighting,Wide Area Network (WAN)
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要