Transparent Fault Tolerance for Stateful Applications in Kubernetes with Checkpoint/Restore

Henri Schmidt, Zeineb Rejiba, Raphael Eidenbenz,Klaus-Tycho Foerster

2023 42ND INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, SRDS 2023(2023)

引用 0|浏览0
暂无评分
摘要
This paper presents a solution providing fault tolerance for stateful containerized applications that is transparent, i.e., the application does not require to structure or manage its state in any particular fashion. In the case of faults, such as node crashes or node isolation, the application resumes execution on another node. The solution relies on a Kubernetes operator and a tool to periodically checkpoint containers and restore from the latest checkpoints in case of a node failure. Experimental evaluations reveal the trade-offs between overhead due to checkpointing, i.e., CPU load, memory, network bandwidth, reduced availability, and the performance during recovery, i.e., outage time, state quality. Compared to a nontransparent solution, the transparent solution yields
更多
查看译文
关键词
fault tolerance,container orchestration
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要