Multiple Cnn-Based Tasks Scheduling Across Shared Gpu Platform In Research And Development Scenarios

IEEE 20TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS / IEEE 16TH INTERNATIONAL CONFERENCE ON SMART CITY / IEEE 4TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS)(2018)

引用 8|浏览32
暂无评分
摘要
In the scope of numerous AI enterprises and research institutes, a shared server or cluster, which are based on commodity GPU hardwares, need to process multiple diverse CNN-based tasks simultaneously which are submitted by different developers and researchers. Scheduling and processing multiple CNN-based tasks, including training and batch inference, are a significant challenge in these practical scenarios. Previous studies, which focus on either the latency of a single training task or the throughput of multiple inference tasks, cannot effectively exploit the limited system resources available for diverse CNN-based tasks. This paper, for the first time, focuses on this specific AI Research and Development scenario and conducts an series of explorations on characteration and scheduling for CNN-based tasks. In order to evaluate the qualities of processing and scheduling, we propose a series of comprehensive metrics, including user satisfaction and system efficiency. With the metrics, we characterize diverse CNN behaviors of a few typical CNN models under different application and system configurable factors. Then, a heuristic scheduling algorithm informed by our characterization is explored to better allocate computing resources for the upcoming tasks and to schedule them dynamically on the cluster or server. Compared with two baseline strategies, the results, which are evaluated on multi-GPU platforms, show that our proposed algorithm can improve system efficiency by up to 40% and decrease average response latency by around 38% for multiple CNN-based tasks.
更多
查看译文
关键词
CNN, AI Research and Development Scenario, Characterizing, Scheduling Exploration, GPU platform<bold>, </bold>
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要