9.1 A 7nm 4-Core AI Chip with 25.6TFLOPS Hybrid FP8 Training, 102.4TOPS INT4 Inference and Workload-Aware Throttling

2021 IEEE International Solid- State Circuits Conference (ISSCC)(2021)

引用 65|浏览73
暂无评分
摘要
Low-precision computation is the key enabling factor to achieve high compute densities (T0PS/W and T0PS/mm 2 ) in AI hardware accelerators across cloud and edge platforms. However, robust deep learning (DL) model accuracy equivalent to high-precision computation must be maintained. Improvements in bandwidth, architecture, and power management are also required to harness the benefit of reduced precision by feeding and supporting more parallel engines to achieve high sustained utilization and optimize performance within a given product power envelope. In this work, we present a 4-core AI chip in 7nm EUV technology that exploits cutting-edge algorithmic advances for iso-accurate models in low-precision training and inference [1, 2] and aggressive circuit/architecture optimization to achieve leading-edge power-performance. The chip supports fp16 (DLFIoat16 [8]) and hybrid-fp8(hfp8) [1] formats for training and inference of DL models, as well as int4 and int2 formats for highly scaled inference.
更多
查看译文
关键词
workload-aware throttling,low-precision computation,AI hardware accelerators,cloud,edge platforms,robust deep learning model,high-precision computation,power management,4-core AI chip,EUV technology,iso-accurate models,DLFIoat16,edge power-performance,product power envelope,TOPS INT4 inference,architecture optimization,aggressive circuit,cloud platforms,cutting-edge algorithmic
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要