T2S-Tensor: Productively Generating High-Performance Spatial Hardware for Dense Tensor Computations

2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)(2019)

引用 60|浏览279
暂无评分
摘要
We present a language and compilation framework for productively generating high-performance systolic arrays for dense tensor kernels on spatial architectures, including FPGAs and CGRAs. It decouples a functional specification from a spatial mapping, allowing programmers to quickly explore various spatial optimizations for the same function. The actual implementation of these optimizations is left to a compiler. Thus, productivity and performance are achieved at the same time. We used this framework to implement several important dense tensor kernels. We implemented dense matrix multiply for an Arria-10 FPGA and a research CGRA, achieving 88% and 92% of the performance of manually written, and highly optimized expert (ninja") implementations in just 3% of their engineering time. Three other tensor kernels, including MTTKRP, TTM and TTMc, were also implemented with high performance and low design effort, and for the first time on spatial architectures."
更多
查看译文
关键词
Optimization,Computer architecture,Kernel,Field programmable gate arrays,Programming,Hardware
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要