Resource and memory management techniques for the high-level synthesis of software threads into parallel FPGA hardware

Jongsok Choi,Stephen Dean Brown,Jason Helge Anderson

2015 International Conference on Field Programmable Technology (FPT)（2015）

引用 13|浏览68

暂无评分

摘要

Recent work has proposed the high-level synthesis of parallel software programs (specified using Pthreads or OpenMP) into concurrently operating parallel hardware modules [6]. In this paper, we describe resource and memory management techniques for improving performance and area of hardware generated by such software thread synthesis. One direction investigated pertains to how modules in the HLS-generated parallel hardware should connect to one another: 1) with a nested topology, or 2) with a flat topology. In the nested topology, hardware modules are created in a hierarchical manner: modules are instantiated inside within modules that use them. Conversely, the flat topology instantiates all hardware modules at the same level of hierarchy. For the flat topology, we describe a system generator that automatically generates the required interconnect between all hardware modules, as well as flexibly shares or replicates functions, functional units, and memories. We also explore methods to reduce memory contention among hardware units that operate in parallel, by investigating three different memory architectures which use: 1) a global memory controller, 2) local memories, and 3) shared-local memories. Local and shared-local memories are dedicated RAM blocks for a single or a set of hardware modules, and help to increase memory bandwidth by allowing concurrent memory accesses. We also consider memory replication to localize memories in hardware modules, and convert small memories to registers to further improve performance and memory usage. Finally, we describe implementing locks and barriers in HLS hardware: synchronization constructs used in parallel programming. We show that with our resource and memory management techniques, we can improve the geomean performance, area, and area-delay product of parallel HLS-generated hardware up to 41.6%, 38.3%, and 63.3%, respectively, for a set of 15 benchmarks.

查看译文

关键词

HLS-generated parallel hardware,nested topology,flat topology,system generator,memory contention,global memory controller,shared-local memories,dedicated RAM blocks,memory bandwidth,concurrent memory accesses,memory replication,synchronization constructs,parallel programming,parallel FPGA hardware,software thread synthesis,memory management techniques,resource management techniques,concurrently operating parallel hardware modules,parallel software programs,high-level synthesis

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要