MAX-PolyMem: High-Bandwidth Polymorphic Parallel Memories for DFEs

2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)(2018)

引用 7|浏览21
Memory bandwidth is critical for many modern applications and architectures. Parallel memories should alleviate this problem, but they are difficult to design and implement for non-trivial access patterns. Our work addresses these challenges through PolyMem, a Polymorphic Parallel Memory which acts as a software cache and enables parallel memory accesses for different combinations of access patterns. PolyMem's design is based on the Polymorphic Register File (PRF) design, which guarantees conflict-free parallel access for memory access patterns widely used in scientific and multimedia applications. We present in detail the design and implementation of MAXPolyMem, the first native implementation of PolyMem using Maxeler's toolchain. Our design supports multiple lanes, multiple read ports, and concurrent read and write operations. We further provide a detailed empirical analysis of the performance of MAX-PolyMem, including two sets of results. First, we conduct a thorough design space exploration to determine the best configurations and/or the performance bounds of MAX-PolyMem. For example, the design with the maximum read bandwidth is a 512KB memory, with 4 read ports, running at 137MHz, which can reach a peak read bandwidth of around 32GB/s. Second, to determine whether the MAXPolyMem can reach and sustain these peaks, we implement the STREAM-Copy benchmark. The benchmarking results demonstrate that, in practice, MAX-PolyMem reaches over 99% of the theoretical peak performance.
Polymorphic Parallel Memories,Maxeler FPGAs,HLS,STREAM benchmark,Dataflow Engine,Polymorphic Register File
AI 理解论文