GT-scheduler: a hybrid graph-partitioning and tabu-search based task scheduler for distributed data stream processing systems

Cluster Computing(2024)

引用 0|浏览3
暂无评分
摘要
The continual increase in the amount of generated data by social media, IoT devices, and monitoring systems have motivated the use of Distributed Data Stream Processing (DSP) systems to harness data in a real-time manner. The scheduling of processing tasks in DSP systems across the machines in a cluster or cloud environment is an NP-Hard problem. Different scheduling schemes have been proposed to address the scheduling problem, but most fail to take into account the runtime adaptation and workload changes after initial scheduling. In this paper, we propose a new scheduler (GT-Scheduler) that leverages a heuristic and rule-based algorithm to schedule tasks at near-optimal performance alongside using a meta-heuristic algorithm to make runtime adaptation. Firstly, K-way graph partitioning divides the tasks in an application graph according to the communication patterns. It places tasks with the highest amount of communication near each other to limit an increase in the topology response time. Secondly, instead of assigning tasks to the worker nodes, a partition of tasks is assigned to the nodes by adopting a greedy strategy. If the capacity of nodes is insufficient to host a specific partition of tasks, this partition is iteratively divided by the k-partitioning algorithm to assign to a proper node. The idea of runtime adaptation lies in detecting overutilized worker nodes and reassigning their tasks by exploiting a Tabu-Search and a new scoring strategy to find the best solution in a way that no worker node is overutilized. GT-Scheduler is implemented on the standard Apache Storm and using the standard benchmarks, it is shown that GT-Scheduler outperforms the R-Storm and the Online-Scheduler by at least 35% in reducing the topology response time.
更多
查看译文
关键词
Distributed data stream processing,Scheduling,Graph partitioning,Tabu search,Apache storm,Heterogeneous cluster
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要