Morphling: Fast, Near-Optimal Auto-Configuration for Cloud-Native Model Serving

International Conference on Management of Data(2021)

引用 10|浏览41
暂无评分
摘要
ABSTRACTMachine learning models are widely deployed in production cloud to provide online inference services. Efficiently deploying inference services requires careful tuning of hardware and runtime configurations (e.g., GPU type, GPU memory, batch size), which can significantly improve the model serving performance and reduce cost. However, existing autoconfiguration approaches for general workloads, such as Bayesian optimization and white-box prediction, are inefficient in navigating the high-dimensional configuration space of model serving, incurring high sampling cost. In this paper, we present Morphling, a fast, near-optimal auto-configuration framework for cloud-native model serving. Morphling employs model-agnostic meta-learning to navigate the large configuration space. It trains a metamodel offline to capture the general performance trend under varying configurations. Morphling quickly adapts the metamodel to a new inference service by sampling a small number of configurations and uses it to find the optimal one. We have implemented Morphling as an auto-configuration service in Kubernetes, and evaluate its performance with popular CV and NLP models, as well as the production inference services in Alibaba. Compared with existing approaches, Morphling reduces the median search cost by 3x-22x, quickly converging to the optimal configuration by sampling only 30 candidates in a large search space consisting of 720 options.
更多
查看译文
关键词
Cloud Computing, Model Serving, Meta-Learning, Auto-Configuration
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要