Accelerating Large-Scale Graph Analytics with FPGA and HMC

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)(2017)

引用 1|浏览23
暂无评分
摘要
Graph analytics that explores the relationship among interconnected entities is becoming increasingly important due to its broad applicability from machine learning to social science. However, one major challenge for graph processing systems is the irregular data access pattern of graph computation which can significantly degrade the performance. The algorithms, software, and hardware that have been tailored for mainstream parallel applications are, as a result, generally not effective for massive-scale sparse graphs from the real world due to their complexity and irregularity. To address the performance issues in large-scale graph analytics, we combine the emerging Hybrid Memory Cube (HMC) with a modern FPGA in order to achieve exceptional random access performance without any loss of flexibility or efficiency in computation. In particular, we develop collaborative software/hardware techniques to perform a level-synchronized breadth first search (BFS) on the FPGA-HMC platform. From the software perspective, we develop an architecture-aware graph clustering algorithm that fully exploits the platform's capability to improve data locality and memory access efficiency. For each input graph, this algorithm provides an efficient data layout that allows the FPGA to coalesce memory requests into the largest possible HMC payload requests so that the number of memory requests, which is the primary factor in runtime, can be minimized. From the hardware perspective, we further improve the FPGA-HMC graph processor architecture by adding a merging unit. The merging unit takes the best advantage of the increased data locality resulting from graph clustering. We evaluated the performance of our BFS implementation using the AC-510 development kit from Micron over a set of benchmarks from a wide range of applications. We observed that the combination of the clustering algorithm and the merging hardware achieved 2.8 × average performance improvement compared to the latest FPGA-HMC based graph processing system.
更多
查看译文
关键词
Hybrid memory Cube,Graph Clustering,Breadth-First Search
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要