xCloudServing: Automated ML Serving Across Clouds

2023 IEEE 16th International Conference on Cloud Computing (CLOUD)(2023)

引用 0|浏览14
暂无评分
摘要
As machine learning (ML) models have grown in complexity, so too have the expenses they incur when deployed in the cloud. In order to reduce the costs associated with ML serving, it is necessary to optimize the choice of cloud infrastructure used. Additionally, the chosen infrastructure must be able to deliver on the latency constraints that are typically defined for cloud services. This problem is made more challenging since today's organizations often need to work with more than one cloud provider, and each provider offers its own unique set of interfaces and infrastructure choices. In this work we present xCloudServing - a novel system for consistent and automated deployment of ML inference services across multi-ple cloud providers and regions. We describe the architecture and implementation of xCloudServing, as well as the different optimization algorithms implemented internally. These include established methods from the literature, as well as Niebo - our novel algorithm for minimizing cost whilst satisfying the tail latency constraint. We present simulation results for 5 different ML models over 3 cloud providers and multiple tail latency constraints that indicate that on average, Niebo outperforms state-of-the-art algorithms by 37%. Additionally, we evaluate xCloudServing with live runs and demonstrate that it is robust to nondeterministic effects and exhibits reproducible behavior.
更多
查看译文
关键词
multi-cloud,optimization,machine learning serving,cloud configuration,automated cloud deployment
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要