Flash: A Hybrid Private Inference Protocol for Deep CNNs with High Accuracy and Low Latency on CPU
CoRR(2024)
摘要
This paper presents Flash, an optimized private inference (PI) hybrid
protocol utilizing both homomorphic encryption (HE) and secure two-party
computation (2PC), which can reduce the end-to-end PI latency for deep CNN
models less than 1 minute with CPU. To this end, first, Flash proposes a
low-latency convolution algorithm built upon a fast slot rotation operation and
a novel data encoding scheme, which results in 4-94x performance gain over the
state-of-the-art. Second, to minimize the communication cost introduced by the
standard nonlinear activation function ReLU, Flash replaces the entire ReLUs
with the polynomial x^2+x and trains deep CNN models with the new activation
function. The trained models improve the inference accuracy for CIFAR-10/100
and TinyImageNet by 16
art. Last, Flash proposes an efficient 2PC-based x^2+x evaluation protocol
that does not require any offline communication and that reduces the total
communication cost to process the activation layer by 84-196x over the
state-of-the-art. As a result, the end-to-end PI latency of Flash implemented
on CPU is 0.02 minute for CIFAR-100 and 0.57 minute for TinyImageNet
classification, while the total data communication is 0.07GB for CIFAR-100 and
0.22GB for TinyImageNet. Flash improves the state-of-the-art PI by 16-45x in
latency and 84-196x in communication cost. Moreover, even for ImageNet, Flash
can deliver the latency less than 1 minute on CPU with the total communication
less than 1GB.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要