MAD: Memory-Aware Design Techniques for Accelerating Fully Homomorphic Encryption

56TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, MICRO 2023(2023)

引用 0|浏览10
暂无评分
摘要
Cloud computing has made it easier for individuals and companies to get access to large compute and memory resources. However, it has also raised privacy concerns about the data that users share with the remote cloud servers. Fully homomorphic encryption (FHE) offers a solution to this problem by enabling computations over encrypted data. Unfortunately, all known constructions of FHE require a noise term for security, and this noise grows during computation. To perform unlimited computations on the encrypted data, we need to perform a periodic noise reduction step known as bootstrapping. This bootstrapping operation is memory-bound as it requires several GBs of data. This leads to orders of magnitude increase in the time required for operating on encrypted data as compared to unencrypted data. In this work, we first present an in-depth analysis of the bootstrapping operation in the CKKS FHE scheme. Similar to other existing works, we observe that CKKS bootstrapping exhibits a low arithmetic intensity (< 1 Op/byte). We then propose memory-aware design (MAD) techniques to accelerate the bootstrapping operation of the CKKS FHE scheme. Our proposed MAD techniques are agnostic of the underlying compute platform and can be equally applied to GPUs, CPUs, FPGAs, and ASICs. Our MAD techniques make use of several caching optimizations that enable maximal data reuse and perform reordering of operations to reduce the amount of data that needs to be transferred to/from the main memory. In addition, our MAD techniques include several algorithmic optimizations that reduce the number of data access pattern switches and the expensive NTT operations. Applying our MAD optimizations for FHE improves bootstrapping arithmetic intensity by 3x. For Logistic Regression (LR) training, by leveraging our MAD optimizations, the existing GPU design can get up to 3.5x improvement in performance for the same on-chip memory size. Similarly, the existing ASIC designs can get up to 27x and 57x improvement in performance for LR training and ResNet-20 inference, respectively, while reducing the on-chip memory requirement by 16x, which proportionally reduces the cost of the solution.
更多
查看译文
关键词
Fully Homomorphic Encryption,CKKS Scheme,Memory Bottleneck Analysis,Cache Optimizations,SimFHE,Hardware Acceleration,Bootstrapping
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要