An Eight-Core RISC-V Processor With Compute Near Last Level Cache in Intel 4 CMOS

IEEE Journal of Solid-State Circuits(2022)

引用 4|浏览2
暂无评分
摘要
An eight-core 64-b processor extends RISC-V to perform multiply–accumulate (MAC) within the shared last level cache (LLC). Instead of moving data from the LLC to the core, compute near last level cache (CNC) adds MAC to the LLC datapath and performs computation near where the data are stored. The RV64GC CNC instruction set architecture (ISA) extension performs digital MAC near unmodified SRAM arrays and has a low area overhead of 1.4%. CNC increases memory access width to 512 b per instruction by avoiding bottlenecks in the on- chip networks. The operation also reduces data movement by keeping MAC results and most input operands local to the LLC slices. CNC supports computation on cached data from main memory, coherent data sharing between cores, and virtual addressing. The CNC instructions are included in C++ programs and run either baremetal or in Linux. The 1.15-GHz chip reduces energy consumption by 52 $\times $ for fully connected and 29 $\times $ for convolutional deep neural network (DNN) layers, compared to scalar operation. Two benchmarks are characterized: MLPerf Tiny Anomaly Detection v0.5 latency is reduced by 4.25 $\times $ to 40 $\mu \text{s}$ versus previous work, and inference latency on memory-augmented neural networks is improved by 4.1 $\times $ versus scalar operation.
更多
查看译文
关键词
Cache memory,deep learning processing,machine learning processing,near-memory computation,RISC-V,single instruction multiple data (SIMD)
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要