Sectum: Accurate Latency Prediction for TEE-hosted Deep Learning Inference

2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)(2022)

引用 2|浏览35
暂无评分
摘要
As the security issue of cloud-offloaded Deep Learning (DL) inference is drawing increasing attention, running DL inference in Trusted Execution Environments (TEEs) has become a common practice. Latency prediction of TEE-hosted DL model inference is essential for many scenarios, such as DNN model architecture searching with a latency constraint or layer scheduling in model-parallelism inference. However, existing solutions fail to address the memory over-commitment issue in resource-constrained environments inside TEEs.This paper presents Sectum, an accurate latency predictor for DL inference inside TEE enclaves. We first perform a synthetic empirical study to analyze the relationship between inference latency and memory occupation. Sectum predicts inference latency following a two-stage design based on some critical observations. First, Sectum uses a Graph Neural Network (GNN)-based model to detect whether a given model would trigger memory over-commitment in TEEs. Then, combining operator-level latency modeling with linear regression, Sectum could predict the latency of a model. To evaluate Sectum, we design a large dataset that contains the latency information of over 6k CNN models. Our experiments demonstrate that Sectum could achieve over 85% ±10% accuracy of latency prediction. To our knowledge, Sectum is the first method to predict TEE-hosted DL inference latency accurately.
更多
查看译文
关键词
deep learning,inference,trusted execution environments,SGX
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要