25.4 A 20nm 6GB Function-In-Memory DRAM, Based on HBM2 with a 1.2TFLOPS Programmable Computing Unit Using Bank-Level Parallelism, for Machine Learning Applications

2021 IEEE International Solid-State Circuits Conference (ISSCC)(2021)

引用 45|浏览7
暂无评分
摘要
In recent years, artificial intelligence (AI) technology has proliferated rapidly and widely into application areas such as speech recognition, health care, and autonomous driving. To increase the capabilities of AI more powerful systems are needed to process a larger amount of data. This requirement has made domain-specific accelerators, such as GPUs and TPUs, popular; as they can provide orders of magnitude higher performance than state-of-the-art CPUs. However, these accelerators can only operate at their peak performance when they get the necessary data from memory as quickly as it is processed: requiring off-chip memory with a high bandwidth and a large capacity [1]. HBM has thus far met the bandwidth and capacity requirement [2] –[6], but recent AI technologies such as recurrent neural networks require an even higher bandwidth than HBM [7]–[8]. While a further increase in off-chip bandwidth can be accomplished by various techniques, it is often limited by power constraints at the chip or system level [9]. Hence, it is essential to decrease demand for off-chip bandwidth with unconventional architectures: such as processing-in-memory. In this paper, we present function-In-memory DRAM (FIMDRAM) that integrates a 16-wide single-instruction multiple-data engine within the memory banks and that exploits bank-level parallelism to provide $4 \times $ higher processing bandwidth than an off-chip memory solution. Second, we show techniques that do not require any modification to conventional memory controllers and their command protocols, which make FIMDRAM more practical for quick industry adoption. Finally, we conclude this paper with circuit- and system-level evaluations of our fabricated FIMDRAM.
更多
查看译文
关键词
6GB function-in-memory DRAM,HBM2,bank-level parallelism,machine learning applications,artificial intelligence technology,speech recognition,health care,autonomous driving,domain-specific accelerators,magnitude higher performance,state-of-the-art CPU,peak performance,AI technologies,recurrent neural networks,HBM,off-chip bandwidth,power constraints,processing-in-memory,function-Inmemory DRAM,16-wide single-instruction multiple-data engine,memory banks,higher processing bandwidth,off-chip memory solution,conventional memory controllers,circuitand system-level evaluations
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要