Latte: Locality Aware Transformation for High-Level Synthesis

2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)(2018)

引用 22|浏览43
暂无评分
摘要
In this paper we classify the timing degradation problems using four common collective communication and computation patterns in HLS-based accelerator design: scatter, gather, broadcast and reduce. These widely used patterns scale poorly in one-to-all or all-to-one data movements between off-chip communication interface and on-chip storage, or inside the computation logic. Therefore, we propose the Latte microarchitecture featuring pipelined transfer controllers (PTC) along data paths in these patterns. Furthermore, we implement an automated framework to apply our Latte implementation in HLS with minimal user efforts. Our experiments show that Latte-optimized designs greatly improve the timing of baseline HLS designs by 1.50x with only 3.2% LUT overhead on average, and 2.66x with 2.7% overhead at maximum.
更多
查看译文
关键词
High level synthesis,Timing,Pipeline,Frequency Optimization,FPGA
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要