A 3.0 TFLOPS 0.62V Scalable Processor Core for High Compute Utilization AI Training and Inference

2020 IEEE Symposium on VLSI Circuits(2020)

引用 24|浏览84
暂无评分
摘要
A processor core is presented for AI training and inference products. Leading-edge compute efficiency is achieved for robust fp16 training via efficient heterogeneous 2-D systolic array-SIMD compute engines leveraging compact DLFloat16 FPUs. Architectural flexibility is maintained for very high compute utilization across neural network topologies. A modular dual-corelet architecture with a shared scratchpad and a software-controlled network/memory interface enables scalability to many-core SoCs and large-scale systems. The 14nm AI core achieves fp16 peak performance of 3.0 TFLOPS at 0.62V and 1.4 TFLOPS/W at 0.54V.
更多
查看译文
关键词
neural network topologies,dual-corelet architecture,many-core SoC,AI core,fp16 peak performance,high compute utilization AI training,inference,leading-edge compute efficiency,robust fp16 training,2-D systolic array-SIMD compute engines,architectural flexibility,compact DLFloat16 FPU,scalable processor core
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要