Self managed data protection for containers.

SYSTOR(2021)

引用 0|浏览4
暂无评分
摘要
Container frameworks have been gaining popularity in recent years, with container native storage being one of the fastest growing segment. According to IDC report [1], 90% of applications on cloud platforms and over 95% of new microservices are being deployed in containers. The growth of container native storage is largely driven by stateful applications [2, 3], the mainstay of enterprise IT environments. As organizations are increasingly adopting containerized deployments, they must also deal with data protection to maintain business continuity. The users or application developers in containerized environment focus on designing and configuring the application, whereas various management aspects are handled by the orchestration system. For instance, the Kubernetes framework deals with load balancing and failover of application components without active user involvement. In the same mold, we present a self-service backup system for containerized stateful applications. Here, backups refer to physically distinct copies of point-in-time logical copies of data that are used to protect data from logical corruption, human error and cyber attacks such as ransomware attacks. Our goal is to allow users to convey their data protection requirements, such as Recovery Point Objective (RPO) or retention period, in a similar manner as application configuration. The RPO expresses the data loss tolerance for the backup. The RPO is said to be T hours if the application can lose no more data than the changes made in the last T hours. Given the user requirements, the backup system snapshots the application data volumes and copies out the snapshots to an external repository. In case of a failure of the primary volume, the volume can be restored from the repository. In traditionally managed data centers, administrators are responsible for assessing the load on the storage system and accordingly configuring backup objectives for all users. However, self-service containerized environment present the following challenges. First, in such environments each application instance, i.e. a stateful container, is provisioned with a separate volume. Therefore a cluster may consist of thousands of small volumes with varying RPOs. Moreover, the end users may not be aware of failures or resource fluctuations, e.g., load on the storage system, to know if the specified data protection guarantee can be fulfilled. Second, when the storage system is being used by IO intensive applications, the application traffic competes with the backup traffic and maintaining the RPO of backups becomes challenging. When the rate of data copying for backups is increased to meet the RPO, the applications may suffer from the adverse impact. In contrast, when the rate of data copying is decreased to reduce its interference with the application traffic, the backup may violate the provided RPO guarantee. We address the above problems in the proposed self-service backup system. To protect data of their stateful applications, users can simply specify the desired protection policies, namely RPO and retention period, and need not dictate when or how frequently the volumes are backed up. The users can request and monitor backups through a familiar Kubernetes interface, i.e, kubectl. Additionally, we propose adaptive scheduling to minimize the snapshots required to provide the necessary RPO guarantee. The scheduler varies the interval between the snapshots based on system load. When storage system, network is under heavy load, or experiencing any failures or outages, the snapshot frequency is increased, thus affording more time for offloading the data. At other times, the scheduler reduces the snapshot frequency thereby reducing system processing and network overhead. The system is also resilient to resource, component and backup job failures. It treats each operation (request creation, snapshot, data copy) as a transaction and leverage Kubernetes to restart the erroneous or failed component on same or different nodes. The transactional semantic ensures continuity through such restarts. Our evaluation with thousands volumes with varying RPOs shows that our system can quickly react to resource fluctuations and minimize RPO violations as compared to the fixed schedulers.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要