Accelerating GNN Training by Adapting Large Graphs to Distributed Heterogeneous Architectures

Lizhi Zhang,Kai Lu,Zhiquan Lai,Yongquan Fu,Yu Tang,Dongsheng Li

IEEE TRANSACTIONS ON COMPUTERS（2023）

引用 0|浏览21

暂无评分

摘要

Graph neural networks (GNNs) have been successfully applied to many important application domains on graph data. As graphs become increasingly large, existing GNN training frameworks typically use mini-batch sampling during feature aggregation to lower resource burdens, which unfortunately suffer from long memory accessing latency and inefficient data transfer of vertex features from CPU to GPU. This paper proposes 2PGraph, a system that addresses these limitations of mini-batch sampling and feature aggregation and supports fast and efficient single-GPU and distributed GNN training. First, 2PGraph presents a locality awareness GNN-training scheduling method that schedules the vertices based on the locality of the graph topology, significantly accelerating the sampling and aggregation, improving the data locality of vertex access, and limiting the range of neighborhood expansion. Second, 2PGraph proposes a GNN-layer-aware feature caching method on available GPU resources with a hit rate up to 100%, which avoids redundant data transfer between CPU and GPU. Third, 2PGraph presents a self-dependence cluster-based graph partition method, achieving high sampling and cache efficiency for distributed environments. Experimental results on real-world graph datasets show that 2PGraph reduces memory access latency by up to 90% mini-batch sampling, and data transfer time by up to 99%. For distributed GNN training over an 8-GPU cluster, 2PGraph achieves up to 8.7x performance speedup over state-of-the-art approaches.

查看译文

关键词

Training,Graphics processing units,Graph neural networks,Loading,Pipelines,Distributed databases,Social networking (online),pipeline parallel,data parallel,sampling,dataloading,cache

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要