Partitioning FPGA-Optimized Systolic Arrays for Fun and Profit
ICFPT, pp. 144-152, 2019.
We can improve the inference throughput of deep convolutional networks mapped to FPGA-optimized systolic arrays, at the expense of latency, with array partitioning and layer pipelining. Modern convolutional networks have a growing number of layers, such as the 58 separable layer GoogleNetv1, with varying compute, storage, and data movemen...More
Full Text (Upload PDF)
PPT (Upload PPT)
Best Paper of FPT, 2019