Dataflow Systolic Array Implementations of Matrix Decomposition Using High Level Synthesis.

FPGA(2019)

引用 6|浏览34
暂无评分
摘要
Matrix decomposition is a fundamental topic in numerical algebra, with its applications frequently seen in a wide range of engineering fields. Many specific systolic array structures of matrix decomposition algorithms have been proposed previously to maintain high performance as the problem size scales up. In this paper, we broadly explore different mappings of most frequently used Cholesky, LU and QR decomposition algorithms to systolic arrays. We follow the canonical mapping method to define the systolic array design space. By selecting different linear projection vectors on the dependency graph of each algorithm, multiple one-dimensional and two-dimensional systolic arrays are generated. To obtain better performance, we also introduce streaming dataflow on the top module which enables heterogeneous PEs to work in data-driven manners. All designs are implemented using the Xilinx Vivado High-Level Synthesis tools. We show in our experimental results the differences in performance and resource consumption of each mapping. We also demonstrate up to 50.13x and 4.58x better throughput of our implementations compared with the Xilinx HLS linear algebra library and the LAPACK library on CPUs.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要