ADGraph: Accurate, Distributed Training on Large Graphs

Computer Science & Information Technology (CS & IT)(2021)

引用 0|浏览2
暂无评分
摘要
Graph neural networks (GNNs) have been emerging as powerful learning tools for recommendation systems, social networks and knowledge graphs. In these domains, the scale of graph data is immense, so that distributed graph learning is required for efficient GNNs training. Graph partition-based methods are widely adopted to scale the graph training. However, most of the previous works focus on scalability other than the accuracy and are not thoroughly evaluated on large-scale graphs. In this paper, we introduce ADGraph (accurate and distributed training on large graphs), exploring how to improve accuracy while keeping large-scale graph training scalability. Firstly, to maintain complete neighbourhood information of the training nodes after graph partitioning, we assign l-hop neighbours of the training nodes to the same partition. We also analyse the accuracy and runtime performance of graph training, with different l-hop settings. Secondly, multi-layer neighbourhood sampling is performed on each partition, so that the mini-batch generated can accurately train target nodes. We study the relationship between convergence accuracy and the sampled layers. We also find that partial neighbourhood sampling can achieve better performance than full neighbourhood sampling. Thirdly, to further overcome the generalization error caused by large-batch training, we choose to reduce batchsize after graph partitioned and apply the linear scaling rule in distributed optimization. We evaluate ADGraph using GraphSage and GAT models with ogbn-products and Reddit datasets on 32 GPUs. Experimental results show that ADGraph achieves better performance than the benchmark accuracy of GraphSage and GAT, while getting 24-29 times speedup on 32 GPUs.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要