GForce: GPU-Friendly Oblivious and Rapid Neural Network Inference

PROCEEDINGS OF THE 30TH USENIX SECURITY SYMPOSIUM(2021)

引用 27|浏览45
暂无评分
摘要
Neural-network classification is getting more pervasive. It captures data of the subjects to be classified, e.g., appearance for facial recognition, which is personal and often sensitive. Oblivious inference protects the data privacy of both the query and the model. However, it is not as fast and as accurate as its plaintext counterpart. A recent cryptographic solution Delphi (Usenix Security 2020) strives for low latency by using GPU on linear layers and replacing some non-linear units in the model at a price of accuracy. It can handle a query on CIFAR-100 with similar to 68% accuracy in 14s or similar to 66% accuracy in 2.6s. We propose GForce, tackling the latency issue from the root causes instead of approximating non-linear computations. With the SWALP training approach (ICML 2019), we propose stochastic rounding and truncation (SRT) layers, which fuse quantization with dequantization between non-linear and linear layers and free us from floating-point operations for efficiency. They also ensure high accuracy while working over the severely-finite cryptographic field. We further propose a suite of GPU-friendly secure online/offline protocols for common operations, including comparison and wrap-around handling, which benefit non-linear layers, including our SRT. With our two innovations, GForce supports VGG-16, attaining similar to 73% accuracy over CIFAR-100 for the first time, in 0:4s. Compared with the prior best non-approximated solution (Usenix Security 2018), GForce speeds up non-linear layers in VGG by >34x. Our techniques shed light on a new direction that utilizes GPU throughout the model to minimize latency.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要