ENIGMA: Low-Latency and Privacy-Preserving Edge Inference on Heterogeneous Neural Network Accelerators

Qiushi Li,Ju Ren,Xinglin Pan,Yuezhi Zhou,Yaoxue Zhang

2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)（2022）

引用 1|浏览12

暂无评分

摘要

Time-efficient artificial intelligence (AI) service has recently witnessed increasing interest from academia and industry due to the urgent needs in massive smart applications such as self-driving cars, virtual reality, high-resolution video streaming, etc. Existing solutions to reduce AI latency, like edge computing and heterogeneous neural-network accelerators (NNAs), face high risk of privacy leakage. To achieve both low-latency and privacy-preserving purposes on edge servers (e.g., NNAs), this paper proposes ENIGMA that can exploit the trusted execution environment (TEE) and heterogeneous NNAs of edge servers for edge inference. The low-latency is supported by a new ahead-of-time analysis framework for analyzing the linearity of multilayer neural networks, which automatically slices forward-graph and assigns sub-graphs to TEE or NNA. To avoid privacy leakage issue, we then introduce a pre-forwarded cipher generation (PFCG) scheme for computing linear sub-forward-graphs on NNA. The input data is encrypted to ciphertext that can be computed directly by linear sub-graphs, and the output can be decrypted to obtain the correct output. To enable non-linear computation of sub-graphs on TEE, we use ring-cache and automatic vectorization optimization to address the memory limitation of TEE. Qualitative analysis and quantitative experiments on GPU, NPU and TPU demonstrate that ENIGMA is not only compatible with heterogeneous NNAs, but also can avoid leakages of private features with latency as low as 50-milliseconds.

查看译文

关键词

edge inference,privacy-preserving,low-latency,neural-network accelerators

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要