A 7-nm Four-Core Mixed-Precision AI Chip With 26.2-TFLOPS Hybrid-FP8 Training, 104.9-TOPS INT4 Inference, and Workload-Aware Throttling

IEEE Journal of Solid-State Circuits(2022)

引用 11|浏览66
暂无评分
摘要
Reduced precision computation is a key enabling factor for energy-efficient acceleration of deep learning (DL) applications. This article presents a 7-nm four-core mixed-precision artificial intelligence (AI) chip that supports four compute precisions—FP16, Hybrid-FP8 (HFP8), INT4, and INT2—to support diverse application demands for training and inference. The chip leverages cutting-edge algorithm...
更多
查看译文
关键词
Training,Artificial intelligence,AI accelerators,Inference algorithms,Computer architecture,Bandwidth,System-on-chip
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要