HyperSched - Dynamic Resource Reallocation for Model Development on a Deadline.

Richard Liaw,Romil Bhardwaj,Lisa Dunlap, Yitian Zou,Joseph E. Gonzalez,Ion Stoica,Alexey Tumanov

SoCC '19: ACM Symposium on Cloud Computing Santa Cruz CA USA November, 2019（2019）

引用 35|浏览144

暂无评分

摘要

Prior research in resource scheduling for machine learning training workloads has largely focused on minimizing job completion times. Commonly, these model training workloads collectively search over a large number of parameter values that control the learning process in a hyperparameter search. It is preferable to identify and maximally provision the best-performing hyperparameter configuration (trial) to achieve the highest accuracy result as soon as possible. To optimally trade-off evaluating multiple configurations and training the most promising ones by a fixed deadline, we design and build HyperSched---a dynamic application-level resource scheduler to track, identify, and preferentially allocate resources to the best performing trials to maximize accuracy by the deadline. HyperSched leverages three properties of a hyperparameter search workload overlooked in prior work -- trial disposability, progressively identifiable rankings among different configurations, and space-time constraints -- to outperform standard hyperparameter search algorithms across a variety of benchmarks.

查看译文

关键词

Machine Learning Scheduling, Hyperparameter Optimization, Distributed Machine Learning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要