An Irredundant and Compressed Data Layout to Optimize Bandwidth Utilization of FPGA Accelerators
CoRR(2024)
摘要
Memory bandwidth is known to be a performance bottleneck for FPGA
accelerators, especially when they deal with large multi-dimensional data-sets.
A large body of work focuses on reducing of off-chip transfers, but few authors
try to improve the efficiency of transfers. This paper addresses the later
issue by proposing (i) a compiler-based approach to accelerator's data layout
to maximize contiguous access to off-chip memory, and (ii) data packing and
runtime compression techniques that take advantage of this layout to further
improve memory performance. We show that our approach can decrease the I/O
cycles up to 7× compared to un-optimized memory accesses.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要