A Highly Unified Reconfigurable Multicore Architecture to Speed Up NTT/INTT for Homomorphic Polynomial Multiplication

IEEE Transactions on Very Large Scale Integration (VLSI) Systems(2022)

引用 16|浏览5
暂无评分
摘要
The ring learning with error (RLWE)-based fully homomorphic encryption (FHE) scheme has become one of the most promising FHE schemes. However, its performance is limited by the homomorphic multiplication, especially the polynomial multiplication which occupies major computing resources. Therefore, efficient implementation of polynomial multiplication is crucial for high-performance FHE applications. In this article, we present an area-efficient and highly unified reconfigurable multicore number theoretic transform (NTT)/inverse NTT (INTT) architecture (named MCNA), which employs NTT and INTT for polynomial multiplier with a variable number of reconfigurable processing elements. To reduce latency, MCNA merges the preprocessing and postprocessing into the constant-geometry NTT and INTT, respectively. Also, a reconfigurable modular multiplier based on digital signal processor (DSP) is proposed to speed up the modular multiplication. In order to avoid designing independent memory access pattern for INTT, a unified read/write structure of NTT/INTT is presented. Furthermore, a novel memory access pattern named “cyclic-sharing” is proposed to reduce 25% memory capacity. MCNA is evaluated on a Xilinx Virtex-7 field-programmable gate array (FPGA) platform. Running at 250-MHz clock frequency, the throughput of MCNA for NTT/INTT achieves $2.78\times \sim 9.32\times $ improvements in comparison to prior works, while the area efficiency of lookup table (LUT) and flip-flop (FF) is improved by $1.25\times \sim 4.79\times $ . For polynomial multiplication, the throughput of MCNA achieves $3.73\times \sim 7.69\times $ enhancements, as well as $1.13\times \sim 14.8\times $ area efficiency improvements.
更多
查看译文
关键词
Constant-geometry (CG) number theoretic transform (NTT)/inverse NTT (INTT),fully homomorphic encryption (FHE),memory access pattern,multicore architecture,polynomial multiplier,reconfigurable processing element (RPE)
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要