Managing HBM Bandwidth on Multi-Die FPGAs with FPGA Overlay NoCs

Srinirdheeshwar Kuttuva Prakash,Hiren Patel,Nachiket Kapre

2022 IEEE 30th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)(2022)

引用 1|浏览10
暂无评分
摘要
We can improve HBM bandwidth distribution and utilization on a multi-die FPGA like Xilinx Alveo U280 by using Overlay Network-on-Chips (NoCs). The HBM in Xilinx Alveo U280 offers 8 GB of memory capacity with a theoretical maximum bandwidth of 460 GBps, but exposed all the HBM ports to the FPGA fabric in only one die. As a result, computing elements assigned to other dies must use the scarce Super Long Lines (SLLs) to access HBM bandwidth. Furthermore, HBM is fractured internally into thirty-two smaller memories called pseudo channels, connected together by a hardened and performance-limited crossbar. The crossbar enables global accesses from any of the HBM ports, but introduces several throughput bottlenecks. An Overlay Hybrid NoC combining Hoplite NoC with Butterfly Fat Trees (BFT) NoCs offers a high-performance solution for distributing HBM bandwidth across all three dies. The routing capability of the NoC can be modified to supplant the internal crossbar of Xilinx HBM for global accesses. We demonstrate this in Xilinx Alveo U280 with BFT, Hoplite, and Hybrid NoC, using synthetic benchmarks and two application-based benchmarks, Dense matrix-matrix multiplication (DMM) and Sparse Matrix-Vector multiplication (SPMV). Our experiments show that Overlay NoCs can improve the throughput by 1.26× for synthetic benchmarks and up to 1.4× for SpMV workloads.
更多
查看译文
关键词
HBM bandwidth distribution,Xilinx Alveo U280,memory capacity,HBM ports,FPGA fabric,access HBM bandwidth,global accesses,Hoplite NoC,distributing HBM bandwidth,internal crossbar,Xilinx HBM,multidie FPGA,overlay network-on-chips,FPGA overlay NoC,overlay hybrid NoC,scarce super long lines,Butterfly Fat Trees NoC,memory size 8.0 GByte
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要