Augmenting HLS with Zero-Overhead Application-Specific Address Mapping for Optane DCPMM

2022 IEEE 30th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)(2022)

引用 1|浏览15
暂无评分
摘要
FPGAs have been introduced to datacenters as a mainstream computing device to accelerate a wide range of data-intensive applications when paired with heterogeneous memory. Leveraging High-Level Synthesis (HLS), application engineers not only can accelerate their applications but also the development time of designing, debugging and validating accelerators. However, existing HLS flows do not have effective support for emerging memory devices such as Intel’s Optane DC Persistent Memory Modules (Optane DCPMM) – a storage-class memory in a DIMM form factor. In fact, we observe that some HLS kernels can at best utilize only one-tenth of the total memory bandwidth of Optane DCPMM.To remedy the poor performance of HLS with Optane DCPMM, we augment the existing HLS external memory interface with zero-overhead, application-specific address mapping capabilities. The proposed scheme utilizes both fine-grained information from variable access patterns and coarse-grained variable-interleaving information to select an optimal hybrid address mapping for high memory bandwidth utilization, compared to a default fixed address mapping in existing HLS. Furthermore, our scheme is compatible with existing tool flows such as the Intel FPGA SDK for OpenCL and Vitis Application Flow to maintain a low adoption barrier. We observe that by using our proposed address mapping scheme and interface, we achieve 10× speedup on a diverse set of benchmarks including merge join, matrix multiplication and convolution without any additional hardware cost.
更多
查看译文
关键词
DIMM form factor,HLS kernels,utilize only one-tenth,total memory bandwidth,Optane DCPMM.To,existing HLS external memory interface,application-specific address mapping capabilities,fine-grained information,variable-interleaving information,optimal hybrid address mapping,high memory bandwidth utilization,default fixed address mapping,Vitis Application Flow,address mapping scheme,zero-overhead Application-specific address mapping,mainstream computing device,data-intensive applications,heterogeneous memory,High-Level Synthesis,application engineers,designing validating accelerators,debugging validating accelerators,existing HLS flows,memory devices,Intel's Optane DC Persistent Memory Modules,storage-class memory
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要