Online Placement and Scaling of Geo-Distributed Machine Learning Jobs via Volume-Discounting Brokerage

IEEE Transactions on Parallel and Distributed Systems(2020)

引用 7|浏览119
暂无评分
摘要
Geo-distributed machine learning (ML) often uses large geo-dispersed data collections produced over time to train global models, without consolidating the data to a central site. In the parameter server architecture, “workers” and “parameter servers” for a geo-distributed ML job should be strategically deployed and adjusted on the fly, to allow easy access to the datasets and fast exchange of the model parameters at any time. Despite many cloud platforms now provide volume discounts to encourage the usage of their ML resources, different geo-distributed ML jobs that run in the clouds often rent cloud resources separately and respectively, thus rarely enjoying the benefit of discounts. We study an ML broker service that aggregates geo-distributed ML jobs into cloud data centers for volume discounts via dynamic online placement and scaling of workers and parameter servers in individual jobs for long-term cost minimization. To decide the number and the placement of workers and parameter servers, we propose an efficient online algorithm which first decomposes the online problem into a series of one-shot optimization problems solvable at each individual time slot by the technique of regularization, and afterwards round the fractional decisions to the integer ones via a carefully-designed dependent rounding method. We prove a parameterized-constant competitive ratio for our online algorithm as the theoretical performance analysis, and also conduct extensive simulation studies to exhibit its close-to-offline-optimum practical performance in realistic settings.
更多
查看译文
关键词
Data centers,Data models,Servers,Cloud computing,Machine learning,Optimization,Minimization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要