ACTS: A Near-Memory FPGA Graph Processing Framework

PROCEEDINGS OF THE 2023 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD PROGRAMMABLE GATE ARRAYS, FPGA 2023(2023)

引用 1|浏览7
暂无评分
摘要
Despite the high off-chip bandwidth and on-chip parallelism offered by today's near-memory accelerators, software-based (CPU and GPU) graph processing frameworks still suffer performance degradation from under-utilization of available memory bandwidth because graph traversal often exhibits poor locality. Emerging FPGA-based graph accelerators tackle this challenge by designing specialized graph processing pipelines and application-specific memory subsystems to maximize bandwidth utilization and efficiently utilize high-speed on-chip memory. To use the limited on-chip (BRAM) memory effectively while handling larger graph sizes, several FPGA-based solutions resort to some form of graph slicing or partitioning during pre-processing to stage vertex property data into the BRAM. While this has demonstrated performance superiority for small graphs, this approach breaks down with larger graph sizes. For example, GraphLily [19], a recent high-performance FPGA-based graph accelerator, experiences up to 11X performance degradation between graphs having 3M vertices and 28M vertices. This makes prior FPGA approaches impractical for large graphs. We propose ACTS, an HBM-enabled FPGA graph accelerator, to address this problem. Rather than partitioning the graph offline to improve spatial locality, we partition vertex-update messages (based on destination vertex IDs) generated online after active edges have been processed. This optimizes read bandwidth even as the graph size scales. We compare ACTS against Gunrock, a state-of-the-art graph processing accelerator for the GPU, and GraphLily, a recent FPGA-based graph accelerator also utilizing HBM memory. Our results show a geometric mean speedup of 1.5X, with a maximum speedup of 4.6X over Gunrock, and a geometric speedup of 3.6X, with a maximum speedup of 16.5X, over GraphLily. Our results also showed a geometric mean power reduction of 50% and a mean reduction of energy-delay product of 88% over Gunrock.
更多
查看译文
关键词
graph analytics,high memory bandwidth (HBM),FPGA
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要