AESPA: Asynchronous Execution Scheme to Exploit Bank-Level Parallelism of Processing-in-Memory

Hongju Kal, Chanyoung Yoo,Won Woo Ro

56TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, MICRO 2023(2023)

引用 0|浏览0
暂无评分
摘要
This paper presents an asynchronous execution scheme to leverage the bank-level parallelism of near-bank processing-in-memory (PIM). We observe that performing memory operations underutilizes the parallelism of PIM computation because near-bank PIMs are designated to operate all banks synchronously. The all-bank computation can be delayed when one of the banks performs the basic memory commands, such as read/write requests and activation/precharge operations. We aim to mitigate the throughput degradation and especially focus on execution delay caused by activation/precharge operations. For all-bank execution accessing the same row of all banks, a large number of activation/precharge operations inevitably occur. Considering the timing parameter limiting the rate of row-open operations (tFAW), the throughput might decrease even further. To resolve this activation/precharge overhead, we propose AESPA, a new parallel execution scheme that operates banks asynchronously. AESPA is different from the previous synchronous execution in that (1) the compute command of AESPA targets a single bank, and (2) each processing unit computes data stored in multiple DRAM columns. By doing so, while one bank computes multiple DRAM columns, the memory controller issues activation/precharge or PIM compute commands to other banks. Thus, AESPA hides the activation latency of PIM computation and fully utilizes the aggregated bandwidth of the banks. For this, we modify hardware and software to support vector and matrix computation of previous near-bank PIM architectures. In particular, we change the matrix-vector multiplication based on an inner product to fit it on AESPA PIM. Previous matrix-vector multiplication requires data broadcasting and simultaneous computation across all processing units. By changing the matrix-vector multiplication method, AESPA PIM can transfer data to respective processing units and start computation asynchronously. As a result, the near-bank PIMs adopting AESPA achieve 33.5% and 59.5% speedup compared to two different state-of-the-art PIMs.
更多
查看译文
关键词
Processing-in-Memory,Execution Method,Matrix-Vector Multiplication
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要