Automatic Pipeline Parallelism: A Parallel Inference Framework for Deep Learning Applications in 6G Mobile Communication Systems

IEEE Journal on Selected Areas in Communications(2023)

引用 1|浏览10
暂无评分
摘要
With the rapid development of wireless communication, achieving the neXt generation Ultra-Reliable and Low-Latency Communications (xURLLC) in 6G mobile communication systems has become a critical problem. Among many applications in xURLLC, deep learning model inference requires improvement over its efficiency. Due to the heterogeneous hardware environment in 6G, parallel schedules from distributed machine learning and edge computing has been borrowed to tackle the efficiency problem. However, traditional parallel schedules suffer from high latency, low throughput, and low device utility. In this paper, we propose Automatic Pipeline Parallelism ( $AP^{2}$ ), a parallel inference framework for deep learning applications in 6G mobile communication systems, to improve the model inference efficiency while maintaining reliability. $AP^{2}$ contains three sub-modules. A task-device affinity predictor predicts a task’s expected execution time on a given device. The parallel inference arrangement optimizer finds the most suitable device for each task. The parallel inference scheduler converts the arrangement to a schedule that can be directly executed in the system. The experimental results show that $AP^{2}$ can achieve better latency, throughput, reliability, and device utility than other parallel schedules. Also, the priority of the sub-module designs has been approved through the experiments.
更多
查看译文
关键词
parallelism inference framework,deep learning,deep learning applications,mobile
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要