Design and Performance Analysis of Partial Computation Output Schemes for Accelerating Coded Machine Learning

IEEE Transactions on Network Science and Engineering(2023)

引用 0|浏览11
暂无评分
摘要
Coded machine learning is a technique to use codes, such as $(n,q)$ -maximum-distance-separable ( $(n,q)$ -MDS) codes, to reduce the negative effect of stragglers by requiring $q$ out of $n$ workers to complete their computation. However, the MDS scheme incurs significant inefficiency in wasting stragglers' unfinished computation and keeping faster workers idle. Accordingly, this paper proposes to fragment each worker's load into small pieces and utilizes all workers' partial computation outputs (PCO) to reduce the overall runtime. While easy-to-implement, the theoretical runtime performance analysis of our PCO scheme is challenging. We present new bounds and asymptotic analysis to prove that our PCO scheme always reduces the overall runtime for any random distribution of workers' speeds, and its performance gain over the MDS scheme can be arbitrarily large under high variability of workers' speeds. Moreover, our analysis shows another advantage: the PCO scheme's performance is robust and insensitive to system parameter variations, while the MDS scheme has to know workers' speeds for carefully optimizing $q$ . Finally, our realistic experiments validate that the PCO scheme reduces the overall runtime from that of the MDS scheme by at least $12.3\%,$ and we implement our PCO scheme for solving a typical machine learning problem of linear regression.
更多
查看译文
关键词
Coded machine learning,maximum-distance-separable codes,partial computation outputs,performance bound analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要