Creating High Performance Applications with Intel's FPGA OpenCL™ SDK

IWOCL(2017)

引用 9|浏览36
暂无评分
摘要
After decades of research, High-Level Synthesis has finally caught on as a mainstream design technique for FPGAs. However, achieving performance results that are comparable to designing at a hardware description level still remains a challenge. In this talk, we illustrate how we achieve world class performance results on HPC applications by using OpenCL. Specifically, we show how we achieve 1Tflop of performance on a matrix multiply and over 1.3Tflops on a CNN application, run on Intel's 20nm Arria 10 FPGA device. By leveraging specific coding styles, we show how you can achieve peak performance on the FPGA without having to resort to tedious hardware design languages. Finally, we will describe spatial coding techniques that lead to efficient structures, such as systolic-arrays, to ensure that the FPGA runs efficiently.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要