RecPIM: Efficient In-Memory Processing for Personalized Recommendation Inference Using Near-Bank Architecture

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems(2024)

引用 0|浏览0
暂无评分
摘要
Deep learning (DL)-based personalized recommendation systems consume the major resources in modern AI data centers. The embedding layers with large memory capacity requirement and high bandwidth demand have been identified as the bottleneck of personalized recommendation inference. To mitigate the memory bandwidth bottleneck, near-memory processing (NMP) would be an effective solution which utilizes the through-silicon via (TSV) bandwidth within 3D-stacked DRAMs. However, existing NMP architectures suffer from the limited memory bandwidth caused by hard-to-scale TSVs. To overcome this obstacle, integrating the compute-logic near memory banks becomes a promising but challenging solution, since large memory capacity requirement limits the use of 3D-stacked DRAMs and irregular memory accesses lead to poor data locality, heavy TSV data traffic and low bank-level bandwidth utilization. To address this problem, we propose RecPIM, the first in-memory processing system for personalized recommendation inference using near-bank architecture based on 3D-stacked memory. From the hardware perspective, we introduce a heterogeneous memory system combined with 3D-stacked DRAM and DIMMs to accommodate large embedding tables and provide high bandwidth. By integrating processing logic units near memory banks on DRAM dies, our architecture can exploit the enormous bank-level bandwidth which is much higher than TSV bandwidth. Then, we integrate a small scratchpad memory to exploit the unique data reusability of DL-based personalized recommendation systems. Furthermore, we adopt a unidirectional data communication scheme to avoid additional cross-vault data transfer. From the software perspective, we present a customized programming model to facilitate memory management and task offloading. To reduce the data communication through TSVs and enhance the utilization of bank-level bandwidth, we develop an efficient data mapping scheme by partitioning the vector into smaller subvectors. Experimental results show that RecPIM achieves up to 2.58× speedup and 49.8% energy saving for data movement over the state-of-the-art NMP solution.
更多
查看译文
关键词
Personalized recommendation,processing-in-memory,3D-stacked memory,data reuse,mapping scheme
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要