184QPS/W 64Mb/mm23D Logic-to-DRAM Hybrid Bonding with Process-Near-Memory Engine for Recommendation System

2022 IEEE International Solid- State Circuits Conference (ISSCC)(2022)

Cited 36|Views42
No score
Abstract
The era of AI computing brings significant challenges to traditional computer systems. As shown in Fig. 29.1.1, while the AI model computation requirement increases 750x every two years, we only observe a very slow-paced improvement of memory system capability in terms of both capacity and bandwidth. There are many memory-bound applications, such as natural language processing, recommendation systems, graph analytics, graph neural networks, as well as multi-task online inference, that become dominating AI applications in modern cloud datacenters. Current primary memory technologies that power AI systems and applications include on-chip memory (SRAM), 2.5D integrated memory (HBM [1]), and off-chip memory (DDR, LPDDR, or GDDR SDRAM). Although on-chip memory enjoys low energy access compared to off-chip memory, limited on-chip memory capacity prevents the efficient adoption of large AI models due to intensive and costly off-chip memory access. In addition, the energy consumption of data movement of off-chip memory solutions (HBM and DRAM) is several orders of magnitude larger than that of on-chip memory, bringing the well-known “memory wall [2]“problem to AI systems. Process-near-memory (PNM) and computing-in-memory (CIM) have become promising candidates to tackle the “memory wall” problem in recent years.
More
Translated text
Key words
3D logic-to-DRAM hybrid bonding,process-near-memory engine,recommendation system,AI computing,traditional computer systems,AI model computation requirement,memory system capability,memory-bound applications,natural language processing,graph analytics,graph neural networks,become dominating AI applications,current primary memory technologies,power AI systems,limited on-chip memory capacity,AI models,off-chip memory access,off-chip memory solutions,memory wall [2]problem,computing-in-memory,memory wall problem
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined