CheckMate: Evaluating Checkpointing Protocols for Streaming Dataflows
arxiv(2024)
摘要
Stream processing in the last decade has seen broad adoption in both
commercial and research settings. One key element for this success is the
ability of modern stream processors to handle failures while ensuring
exactly-once processing guarantees. At the moment of writing, virtually all
stream processors that guarantee exactly-once processing implement a variant of
Apache Flink's coordinated checkpoints - an extension of the original
Chandy-Lamport checkpoints from 1985. However, the reasons behind this
prevalence of the coordinated approach remain anecdotal, as reported by
practitioners of the stream processing community. At the same time, common
checkpointing approaches, such as the uncoordinated and the
communication-induced ones, remain largely unexplored.
This paper is the first to address this gap by i) shedding light on why
practitioners have favored the coordinated approach and ii) by investigating
whether there are viable alternatives. To this end, we implement three
checkpointing approaches that we surveyed and adapted for the distinct needs of
streaming dataflows. Our analysis shows that the coordinated approach
outperforms the uncoordinated and communication-induced protocols under
uniformly distributed workloads. To our surprise, however, the uncoordinated
approach is not only competitive to the coordinated one in uniformly
distributed workloads, but it also outperforms the coordinated approach in
skewed workloads. We conclude that rather than blindly employing coordinated
checkpointing, research should focus on optimizing the very promising
uncoordinated approach, as it can address issues with skew and support
prevalent cyclic queries. We believe that our findings can trigger further
research into checkpointing mechanisms.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要