PipeEdge: Pipeline Parallelism for Large-Scale Model Inference on Heterogeneous Edge Devices

2022 25th Euromicro Conference on Digital System Design (DSD)(2022)

引用 3|浏览6
暂无评分
摘要
Deep neural networks with large model sizes achieve state-of-the-art results for tasks in computer vision and natural language processing. However, such models are too compute- or memory-intensive for resource-constrained edge devices. Prior works on parallel and distributed execution primarily focus on training-rather than inference-using homogeneous accelerators in data centers. We propose PipeEdge, a distributed framework for edge systems that uses pipeline parallelism to both speed up inference and enable running larger, more accurate models that otherwise cannot fit on single edge devices. PipeEdge uses an optimal partition strategy that considers heterogeneity in compute, memory, and network bandwidth. Our empirical evaluation demonstrates that PipeEdge achieves 11.88× and 12.78× speedup using 16 edge devices for the ViT-Huge and BERT-Large models, respectively, with no accuracy loss. Similarly, PipeEdge improves throughput for ViT-Huge (which cannot fit in a single device) by 3.93× over a 4-device baseline using 16 edge devices. Finally, we show up to 4.16× throughput improvement over the state-of-the-art PipeDream when using a heterogeneous set of devices.
更多
查看译文
关键词
deep learning,parallel execution,edge devices,large model inference
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要