GRAPHIC: Gather and Process Harmoniously in the Cache With High Parallelism and Flexibility

Yiming Chen,Mingyen Lee,Guohao Dai,Mufeng Zhou,Nagadastagiri Challapalle,Tianyi Wang, Yao Yu,Yongpan Liu,Yu Wang,Huazhong Yang,Vijaykrishnan Narayanan,Xueqing Li

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING（2024）

Cited 0|Views22

No score

Abstract

In-memory computing (IMC) has been proposed to overcome the von Neumann bottleneck in data-intensive applications. However, existing IMC solutions could not achieve both high parallelism and high flexibility, which limits their application in more general scenarios: As a highly parallel IMC design, the functionality of a MAC crossbar is limited to the matrix-vector multiplication; Another IMC method of logic-in-memory (LiM) is more flexible in supporting different logic functions, but has low parallelism. To improve the LiM parallelism, we are inspired by investigating how the single-instruction, multiple-data (SIMD) instruction set in conventional CPU could potentially help to expand the number of LiM operands in one cycle. The biggest challenge is the inefficiency in handling non-continuous data in parallel due to the SIMD limitation of (i) continuous address, (ii) limited cache bandwidth, and (iii) large full-resolution parallel computing overheads. This article presents GRAPHIC, the first reported in-memory SIMD architecture that solves the parallelism and irregular data access challenges in applying SIMD to LiM. GRAPHIC exploits content-addressable memory (CAM) and row-wise-accessible SRAM. By providing the in-situ, full-parallelism, and low-overhead operations of address search, cache read-compute-and-update, GRAPHIC accomplishes high-efficiency gather and aggregation with high parallelism, high energy efficiency, low latency, and low area overheads. Experiments in both continuous data access and irregular data pattern applications show an average speedup of 5x over iso-area AVX-like LiM, and 3-5x over the emerging CAM-based accelerators of CAPE and GaaS-X in advanced techniques.

Translated text

Key words

Associative memory,single instruction multiple data,FAST SRAM,in-memory computing,logic in memory

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined