Fast Sparse GPU Kernels for Accelerated Training of Graph Neural Networks.

IPDPS（2023）

引用 0|浏览25

暂无评分

摘要

Graph Neural Networks (GNNs) are gaining huge traction recently as they achieve state-of-the-art performance on various graph-related problems. GNN training typically follows the standard Message Passing Paradigm, in which SpMM and SDDMM are the two essential sparse kernels. However, existing sparse GPU kernels are inefficient and may suffer from load imbalance, dynamics in GNN computing, poor memory efficiency, and tail effect. We propose two new kernels, HybridParallel SpMM (HP-SpMM) and Hybrid-Parallel SDDMM (HPSDDMM), that efficiently perform SpMM and SDDMM on GPUs with a unified hybrid parallel strategy of mixing nodes and edges. In view of the emerging graph-sampling training, we design the Dynamic Task Partition (DTP) method to minimize the tail effect by exposing sufficient parallelism. We further devise the Hierarchical Vectorized Memory Access scheme to achieve aligned global memory accesses and enable vectorized instructions for improved memory efficiency. We also propose to enhance data locality by reordering the graphs with the Graph Clustering method. Experiments on extensive sparse matrices collected from real GNN applications demonstrate that our kernels achieve significant performance improvements over state-of-the-art implementations. We implement our sparse kernels in popular GNN frameworks and use them to train various GNN models, including the GCN model in full-graph mode and the GraphSAINT model in graph-sampling mode. Evaluation results show that our kernels can accelerate GNN training by up to 1.72x.

查看译文

关键词

accelerated training,dynamic task partition method,fast sparse GPU kernels,full-graph mode,global memory accesses,GNN computing,GNN training,graph clustering method,graph neural networks,graph-related problems,graph-sampling mode,graph-sampling training,hierarchical vectorized memory access scheme,HP-SpMM,hybrid-parallel SpMM,memory efficiency,sparse matrices,standard message passing paradigm,tail effect,unified hybrid parallel strategy,vectorized instructions

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要