CorcPUM: Efficient Processing Using Cross-Point Memory via Cooperative Row-Column Access Pipelining and Adaptive Timing Optimization in Subarrays.

DAC(2023)

引用 0|浏览3
暂无评分
摘要
Emerging cross-point memory can in-situ perform vector-matrix multiplication (VMM) for energy-efficient scientific computation. However, parasitic-capacitance-induced row charging and discharging latency is a major performance bottleneck of subarray VMM. We propose a memory-timing-compliant bulk VMM processing-using-memory design with row access and column access co-optimization from rethinking of read access commands and mu-op timing. We propose row-level-parallelism-adaptive timing termination mechanism to reduce tail latency of tRCD and tRP by exploiting row nonlinear charging and bulk-interleaved row-column-cooperative VMM access mechanism to reduce tRAS and overlap CL without increasing column ADC precision. Evaluations show that our design can achieve 5.03x performance speedup compared with an aggressive baseline.
更多
查看译文
关键词
5.03× performance speedup,adaptive timing optimization,bulk-interleaved row-column,column access co-optimization,column ADC precision,cooperative row-column access pipelining,emerging cross-point memory,energy-efficient scientific computation,memory-timing-compliant bulk VMM,parasitic-capacitance-induced row,processing-using-memory design,read access commands,row access,row nonlinear charging,row-level-parallelism-adaptive timing termination mechanism,subarray VMM,subarrays,vector-matrix multiplication,μ-op timing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要