Sectum: Accurate Latency Prediction for TEE-hosted Deep Learning Inference

2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)（2022）

引用 2|浏览35

暂无评分

摘要

As the security issue of cloud-offloaded Deep Learning (DL) inference is drawing increasing attention, running DL inference in Trusted Execution Environments (TEEs) has become a common practice. Latency prediction of TEE-hosted DL model inference is essential for many scenarios, such as DNN model architecture searching with a latency constraint or layer scheduling in model-parallelism inference. However, existing solutions fail to address the memory over-commitment issue in resource-constrained environments inside TEEs.This paper presents Sectum, an accurate latency predictor for DL inference inside TEE enclaves. We first perform a synthetic empirical study to analyze the relationship between inference latency and memory occupation. Sectum predicts inference latency following a two-stage design based on some critical observations. First, Sectum uses a Graph Neural Network (GNN)-based model to detect whether a given model would trigger memory over-commitment in TEEs. Then, combining operator-level latency modeling with linear regression, Sectum could predict the latency of a model. To evaluate Sectum, we design a large dataset that contains the latency information of over 6k CNN models. Our experiments demonstrate that Sectum could achieve over 85% ±10% accuracy of latency prediction. To our knowledge, Sectum is the first method to predict TEE-hosted DL inference latency accurately.

查看译文

关键词

deep learning,inference,trusted execution environments,SGX

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要