MTIA: First Generation Silicon Targeting Meta's Recommendation Systems
PROCEEDINGS OF THE 2023 THE 50TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, ISCA 2023(2023)
摘要
Meta has traditionally relied on using CPU-based servers for running inference workloads, specifically Deep Learning Recommendation Models (DLRM), but the increasing compute and memory requirements of these models have pushed the company towards using specialized solutions such as GPUs or other hardware accelerators. This paper describes the company's effort in constructing its first silicon specifically designed for recommendation systems; it describes the accelerator architecture and platform design, the software stack for enabling and optimizing PyTorch-based models and provides an initial performance evaluation. With our emerging software stack, we have made significant progress towards reaching the same or higher efficiency as the GPU: We averaged 0.9x perf/W across various DLRMs, and benchmarks show operators such as GEMMs reaching 2x perf/W. Finally, the paper describes the lessons we learned during this journey which can improve the performance and programmability of future generations of architecture.
更多查看译文
关键词
Accelerators,Machine Learning,Inference,Recommendation Systems,Performance,Programmability
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要