Demeter: Resource-Efficient Distributed Stream Processing under Dynamic Loads with Multi-Configuration Optimization
arxiv(2024)
摘要
Distributed Stream Processing (DSP) focuses on the near real-time processing
of large streams of unbounded data. To increase processing capacities, DSP
systems are able to dynamically scale across a cluster of commodity nodes,
ensuring a good Quality of Service despite variable workloads. However,
selecting scaleout configurations which maximize resource utilization remains a
challenge. This is especially true in environments where workloads change over
time and node failures are all but inevitable. Furthermore, configuration
parameters such as memory allocation and checkpointing intervals impact
performance and resource usage as well. Sub-optimal configurations easily lead
to high operational costs, poor performance, or unacceptable loss of service.
In this paper, we present Demeter, a method for dynamically optimizing key
DSP system configuration parameters for resource efficiency. Demeter uses Time
Series Forecasting to predict future workloads and Multi-Objective Bayesian
Optimization to model runtime behaviors in relation to parameter settings and
workload rates. Together, these techniques allow us to determine whether or not
enough is known about the predicted workload rate to proactively initiate
short-lived parallel profiling runs for data gathering. Once trained, the
models guide the adjustment of multiple, potentially dependent system
configuration parameters ensuring optimized performance and resource usage in
response to changing workload rates. Our experiments on a commodity cluster
using Apache Flink demonstrate that Demeter significantly improves the
operational efficiency of long-running benchmark jobs.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要