Adaptive Scheduling Of Parallel Jobs In Spark Streaming

IEEE INFOCOM 2017 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS(2017)

引用 56|浏览107
暂无评分
摘要
Streaming data analytics has become increasingly vital in many applications such as dynamic content delivery (e.g., advertisements), Twitter sentiment analysis, and security event processing (e.g., intrusion detection systems, and spam filters). Emerging stream processing systems, such as Spark Streaming, treat the continuous stream as a series of micro-batches of data and continuously process these micro-batch jobs. Such micro-batch based stream processing provides several advantages over traditional stream processing systems, which process streaming data one record at a time, including fast recovery from failures, better load balancing and scalability. However, efficient scheduling of micro-batch jobs to achieve high throughput and low latency is very challenging due to the complex data dependency and dynamism inherent in streaming workloads. In this paper, we propose A-scheduler, an adaptive scheduling approach that dynamically schedules parallel micro-batch jobs in Spark Streaming and automatically adjusts scheduling parameters to improve performance and resource efficiency. Specifically, A-scheduler dynamically schedules multiple jobs concurrently using different policies based on their data dependencies and automatically adjusts the level of job parallelism and resource shares among jobs based on workload properties. We implemented A-scheduler and evaluated it with a real-time security event processing workload. Our experimental results show that A-scheduler can reduce end-to-end latency by 42% and improve workload throughput and energy efficiency by 21% and 13%, respectively, compared to the default Spark Streaming scheduler.
更多
查看译文
关键词
parallel jobs,data analytics,dynamic content delivery,intrusion detection systems,microbatch jobs,microbatch based stream processing,traditional stream processing systems,adaptive scheduling approach,data dependencies,job parallelism,resource shares,real-time security event processing workload,streaming data,data dependency,A-scheduler,Spark Streaming scheduler
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要