In-Package Domain-Specific ASICs for Intel® Stratix® 10 FPGAs: A Case Study of Accelerating Deep Learning Using TensorTile ASIC

2018 28th International Conference on Field Programmable Logic and Applications (FPL)(2018)

引用 32|浏览131
暂无评分
摘要
FPGAs or ASICs? FPGAs are extremely flexible while ASICs offer top efficiency. We believe that FPGAs and ASICs are better together, to offer flexibility and efficiency. We propose single-package heterogeneous 2.5D integration of FPGAs and ASICs, using Intel's Embedded Multi-Die Interconnect Bridge (EMIB). Since the ASICs are separate chips from the FPGA, this approach (1) does not change FPGA fabric, allowing re-use of existing ecosystems (FPGA chips, packaging, boards, software), and (2) allows freedom in ASIC design (area/freq/process/etc unconstrained by FPGA fabric). Intel® Stratix® 10 FPGAs already have EMIBs, enabling single-package integration with other chips, or “tiles”. We propose leveraging them to mix-and-match any domain-specific ASICs with Stratix10 FPGAs. In particular, this work focuses on deep learning (DL) domain, which demands efficient tensor (matrix/vector) operations. We propose TensorTile ASICs for Stratix10 FPGAs to provide ASIC-level tensor performance, while relying on FPGA's flexibility for application-specific operations (e.g., Winograd). Our evaluation shows: (1) a small TensorTile offer much better tensor throughput than a large Stratix102800 FPGA; (2) FPGAs and TensorTiles mix-and-match provide scalable solutions (e.g., ~69 peak INT8 TOPs with 1×TensorTile+small Stratix10-400 FPGA, to ~194 peak FP16 TOPs with 6×TensorTiles+large Stratix10-2800); (3) AlexNet performance (performance/Watt) of Intel's DL FPGA design improved by 4× (3.3×) when enhanced with 2×TensorTiles.
更多
查看译文
关键词
Deep learning,system in package,architecture,accelerator
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要