Multilevel Granularity Parallelism Synthesis on FPGAs

FCCM(2011)

引用 56|浏览65
暂无评分
摘要
Recent progress in High-Level Synthesis (HLS) techniques has helped raise the abstraction level of FPGA programming. However implementation and performance evaluation of the HLS-generated RTL, involves lengthy logic synthesis and physical design flows. Moreover, mapping of different levels of coarse grained parallelism onto hardware spatial parallelism affects the final FPGA-based performance both in terms of cycles and frequency. Evaluation of the rich design space through the full implementation flow - starting with high level source code and ending with routed net list - is prohibitive in various scientific and computing domains, thus hindering the adoption of reconfigurable computing. This work presents a framework for multilevel granularity parallelism exploration with HLS-order of efficiency. Our framework considers different granularities of parallelism for mapping CUDA kernels onto high performance FPGA-based accelerators. We leverage resource and clock period models to estimate the impact of multi-granularity parallelism extraction on execution cycles and frequency. The proposed Multilevel Granularity Parallelism Synthesis (ML-GPS) framework employs an efficient design space search heuristic in tandem with the estimation models as well as design layout information to derive a performance near-optimal configuration. Our experimental results demonstrate that ML-GPS can efficiently identify and generate CUDA kernel configurations that can significantly outperform previous related tools whereas it can offer competitive performance compared to software kernel execution on GPUs at a fraction of the energy cost.
更多
查看译文
关键词
design layout information,coarse grained parallelism,multilevel granularity parallelism synthesis,multilevel granularity parallelism exploration,competitive performance,multi-granularity parallelism extraction,final fpga-based performance,high performance,performance evaluation,performance near-optimal configuration,hardware spatial parallelism,logic synthesis,integrated circuit layout,source code,estimation,field programmable gate array,physical design,kernel,fpga,logic design,parallel computing,abstraction level,instruction sets,parallel computer,high level synthesis,field programmable gate arrays,reconfigurable computing,parallel processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要