NeRCC: Nested-Regression Coded Computing for Resilient Distributed Prediction Serving Systems
CoRR(2024)
摘要
Resilience against stragglers is a critical element of prediction serving
systems, tasked with executing inferences on input data for a pre-trained
machine-learning model. In this paper, we propose NeRCC, as a general
straggler-resistant framework for approximate coded computing. NeRCC includes
three layers: (1) encoding regression and sampling, which generates coded data
points, as a combination of original data points, (2) computing, in which a
cluster of workers run inference on the coded data points, (3) decoding
regression and sampling, which approximately recovers the predictions of the
original data points from the available predictions on the coded data points.
We argue that the overall objective of the framework reveals an underlying
interconnection between two regression models in the encoding and decoding
layers. We propose a solution to the nested regressions problem by summarizing
their dependence on two regularization terms that are jointly optimized. Our
extensive experiments on different datasets and various machine learning
models, including LeNet5, RepVGG, and Vision Transformer (ViT), demonstrate
that NeRCC accurately approximates the original predictions in a wide range of
stragglers, outperforming the state-of-the-art by up to 23
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要