Poly: Efficient Heterogeneous System and Application Management for Interactive Applications

2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)(2019)

引用 10|浏览51
暂无评分
摘要
QoS-sensitive workloads, common in warehouse-scale datacenters, require a guaranteed stable tail latency percentile response latency) of the service. Unfortunately, the system load (e.g., RPS) fluctuates drastically during daily datacenter operations. In order to meet the maximum system RPS requirement, datacenter tends to overprovision the hardware accelerators, which makes the datacenter underutilized. Therefore, the throughput and energy efficiency scaling of the current accelerator-outfitted datacenter are very expensive for QoS-sensitive workloads. To overcome this challenge, this work introduces Poly, an OpenCL based heterogeneous system optimization framework that targets to improve the overall throughput scalability and energy proportionality while guaranteeing the QoS by efficiently utilizing GPUs and FPGAs based accelerators within datacenter. Poly is mainly composed of two phases. At compile-time, Poly automatically captures the parallel patterns in the applications and explores a comprehensive design space within and across parallel patterns. At runtime, Poly relies on a runtime kernel scheduler to judiciously make the scheduling decisions to accommodate the dynamic latency and throughput requirements. Experiments using a variety of cloud QoS-sensitive applications show that Poly improves the energy proportionality by 23%(17%) without sacrificing the QoS compared to the state-of-the-art GPU (FPGA) solution, respectively.
更多
查看译文
关键词
Kernel,Field programmable gate arrays,Throughput,Optimization,Quality of service,Cloud computing,Runtime
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要