PruneGNN: Algorithm-Architecture Pruning Framework for Graph Neural Network Acceleration.

Deniz Gurevin,Mohsin Shan,Shaoyi Huang,Md Amit Hasan,Caiwen Ding,Omer Khan

International Symposium on High-Performance Computer Architecture（2024）

引用 0|浏览2

暂无评分

摘要

Performing training and inference for Graph Neural Networks (GNNs) under tight latency constraints has become increasingly difficult as real-world input graphs continue to grow. Compared to traditional DNNs, GNNs present unique computational challenges due to their massive, unstructured, and sparse input graphs. Prior works have applied irregular and structured model pruning techniques to reduce the complexity of GNNs to accelerate GNN performance. However, irregular pruning techniques presented in the literature use floating point operations to estimate G NN performance, which does not reveal the true performance implications of model sparsity caused by the diminished parallelism of sparse matrix multiplication kernels. This paper quantitatively shows that irregular sparsity in G NN models is unable to be exploited to improve performance in parallel architectures that employ highly vectorized hardware. While structured pruning can overcome these issues, the existing structured pruning work for GNNs introduces performance scalability challenges as low-dimensional mapping of the pruned model is unable to exploit the full parallelism potential of the GPU's vectorized hardware. We propose PruneGNN, an optimized algorithm-architecture framework for structured GNN pruning. At the algorithm level, a dimension-pruning-aware sparse training method is proposed that achieves high sparsity while maintaining accuracy. At the architecture level, novel SIMD-aware kernels are proposed that exploit matrix-operator-level parallelism and unlock performance gains with reduced-dimension GNN models. The efficacy of the proposed framework is evaluated for end-to-end inference as well as training performance using real-world dynamic and static graphs on representative GNN models. Experimental results using an NVIDIA A100 GPU show that PruneGNN achieves an average of 2 x speedup over the prior structured pruning work for state-of-the-art GNN models.

查看译文

关键词

Neural Network,Graph Neural Networks,Pruning Framework,Deep Neural Network,Parallelization,Unique Challenges,Matrix Multiplication,Sparse Matrix,Sparse Model,Computational Challenges,Sparse Graph,Input Graph,Static Graph,High Sparsity,Graph Neural Network Model,Pruning Techniques,Weight Matrix,Model Weights,Size Properties,Layer Model,Graph Convolutional Network Model,Graph Convolutional Network,Graph Attention Network,Multiple Threads,Representative Graph,Density Model,Hidden Dimension,Large Graphs,Temporal Graph

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要