Hercules: Heterogeneity-Aware Inference Serving for At-Scale Personalized Recommendation

2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)(2022)

引用 15|浏览56
暂无评分
摘要
Personalized recommendation is an important class of deep-learning applications that powers a large collection of internet services and consumes a considerable amount of datacenter resources. As the scale of production-grade recommendation systems continues to grow, optimizing their serving performance and efficiency in a heterogeneous datacenter is important and can translate into infrastructure capacity saving. In this paper, we propose Hercules, an optimized framework for personalized recommendation inference serving that targets diverse industry-representative models and cloud-scale heterogeneous systems. Hercules performs a two-stage optimization procedure — offline profiling and online serving. The first stage searches the large under-explored task scheduling space with a gradient-based search algorithm achieving up to 9.0× latency-bounded throughput improvement on individual servers; it also identifies the optimal heterogeneous server architecture for each recommendation workload. The second stage performs heterogeneity-aware cluster provisioning to optimize resource mapping and allocation in response to fluctuating diurnal loads. The proposed cluster scheduler in Hercules achieves 47.7% cluster capacity saving and reduces the provisioned power by 23.7% over a state-of-the-art greedy scheduler.
更多
查看译文
关键词
greedy scheduler,heterogeneity-aware cluster provisioning,latency-bounded throughput improvement,task scheduling space,resource mapping,optimal heterogeneous server architecture,gradient-based search algorithm,online serving,offline profiling,two-stage optimization procedure,cloud-scale heterogeneous systems,industry-representative models,personalized recommendation inference,optimized framework,heterogeneous datacenter,production-grade recommendation systems,Internet services,deep-learning applications,heterogeneity-aware inference serving,Hercules
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要