Architectural synthesis of computational pipelines with decoupled memory access

FPT(2014)

引用 14|浏览15
暂无评分
摘要
As high level synthesis (HLS) moves towards mainstream adoption among FPGA designers, it has proven to be an effective method for rapid hardware generation. However, in the context of offloading compute intensive software kernels to FPGA accelerators, current HLS tools do not always take full advantage of the hardware platforms. In this paper, we present an automatic flow to refactor and restructure processor-centric software implementations, making them better suited for FPGA platforms. The methodology generates pipelines that decouple memory operations and data access from computation. The resulting pipelines have much better throughput due to their efficient use of the memory bandwidth and improved tolerance to data access latency. The methodology complements existing work in high-level synthesis, easing the creation of heterogeneous systems with high performance accelerators and general purpose processors. With this approach, for a set of non-regular algorithm kernels written in C, a performance improvement of 3.3 to 9.1x is observed over direct C-to-Hardware mapping using a state-of-the-art HLS tool.
更多
查看译文
关键词
memory subsystem optimization,memory-level parallelism,hardware platforms,fpga platforms,decoupled memory access,direct c-to-hardware mapping,automatic flow,high performance accelerators,compute intensive software kernels,fpga designers,heterogeneous systems,pipeline parallelism,high-level synthesis,processor-centric software implementations,architectural synthesis,logic design,data access latency,memory operations,fpga,nonregular algorithm kernels,general purpose processors,computational pipelines,memory bandwidth,hardware acceleration,fpga accelerators,field programmable gate arrays,hls tools,high level synthesis,rapid hardware generation,pipeline processing,memory level parallelism
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要