Multi-Exit DNN Inference Acceleration Based on Multi-Dimensional Optimization for Edge Intelligence

IEEE Transactions on Mobile Computing(2023)

引用 23|浏览22
暂无评分
摘要
Edge intelligence, as a prospective paradigm for accelerating DNN inference, is mostly implemented by model partitioning which inevitably incurs the large transmission overhead of DNN's intermediate data. A popular solution introduces multi-exit DNNs to reduce latency by enabling early exits. However, existing work ignores the correlation between exit settings and synergistic inference, causing incoordination of device-to-edge. To address this issue, this paper first investigates the bottlenecks of executing multi-exit DNNs in edge computing and builds a novel model for inference acceleration with exit selection, model partition, and resource allocation. To tackle the intractable coupling subproblems, we propose a Multi-exit DNN inference Acceleration framework based on Multi-dimensional Optimization (MAMO). In MAMO, the exit selection subproblem is first extracted from the original problem. Then, bidirectional dynamic programming is employed to determine the optimal exit setting for an arbitrary multi-exit DNN. Finally, based on the optimal exit setting, a DRL-based policy is developed to learn joint decisions of model partition and resource allocation. We deploy MAMO on a real-world testbed and evaluate its performance in various scenarios. Extensive experiments show that it can adapt to heterogeneous tasks and dynamic networks, and accelerate DNN inference by up to 13:7x compared with the state-of-the-art.
更多
查看译文
关键词
Edge intelligence,exit selection,inference acceleration,model partition,multi -exit DNN,resource allocation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要