Synthesis of interleaved multithreaded accelerators from OpenMP loops
2017 International Conference on ReConFigurable Computing and FPGAs (ReConFig)(2017)
摘要
Similarly to CPUs and GPUs, FPGA-based accelerators can also profit from exploiting thread-level parallelism. Thus, the synthesis tools for generating the circuits from high-level languages need to be extended appropriately. We present an extension of the Nymble hardware/software-co-compiler for the automatic synthesis of hardware accelerators from OpenMP worksharing loops, and describe modifications to the datapath- and memory-architecture for multi-threaded execution. The new execution model employs both spatial as well as thread-level parallelism in the micro-architecture of the generated accelerator, with the aim to efficiently hide memory access latencies. We are able to gain raw speedups of more than a factor of 3x, and improve the utilization of the computing unit by more than factor 8x, when executing four threads instead of a single one on the computing units.
更多查看译文
关键词
OpenMP loops,FPGA-based accelerators,thread-level parallelism,synthesis tools,high-level languages,automatic synthesis,hardware accelerators,OpenMP worksharing loops,multithreaded execution,execution model,microarchitecture,generated accelerator,memory access latencies,executing four threads,interleaved multithreaded accelerator synthesis,CPU,GPU,Nymble hardware/software-co-compiler,datapath-and memory-architecture,spatial parallelism
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络