Portable and scalable FPGA-based acceleration of a direct linear system solver

ACM Transactions on Reconfigurable Technology and Systems (TRETS)(2012)

引用 75|浏览21
暂无评分
摘要
FPGAs have the potential to serve as a platform for accelerating many computations including scientific applications. However, the large development cost and short life span for FPGA designs have limited their adoption by the scientific computing community. FPGA-based scientific computing and many kinds of embedded computing could become more practical if there were hardware libraries that were portable to any FPGA-based system with performance that scaled with the size of the FPGA. To illustrate this idea we have implemented one common super-computing library function: the LU factorization method for solving systems of linear equations. This paper describes a method for making the design both portable and scalable that should be illustrative if such libraries are to be built in the future. The design is a software-based generator that leverages both the flexibility of a software programming language and the parameters inherent in an hardware description language. The generator accepts parameters that describe the FPGA capacity and external memory capabilities. We compare the performance of our engine executing on the largest FPGA available at the time of this work (an Altera Stratix III 3S340) to a single processor core fabricated in the same 65nm IC process running a highly optimized software implementation from the processor vendor. For single precision matrices on the order of 10,000 × 10,000 elements, the FPGA implementation is 2.2 times faster and the energy dissipated per useful GFLOP operation is a factor of 5 times less. For double precision, the FPGA implementation is 1.7 times faster and 3.5 times more energy efficient.
更多
查看译文
关键词
scalable fpga-based acceleration,scientific computing community,fpga-based scientific computing,fpga design,scientific application,fpga capacity,lu factorization method,fpga-based system,direct linear system solver,embedded computing,optimized software implementation,fpga implementation,external memory,niobium,field programmable gate arrays,scientific computing,life span,bandwidth,linear system,energy dissipation,engines,linear systems,lu factorization,system on a chip
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要