P-3: Distributed Deep Graph Learning At Scale

Swapnil Gandhi,Anand Padmanabha Iyer,Henry Xu,Theodoros Rekatsinas,Shivaram Venkataraman,Yuan Xie,Yufei Ding,Keval Vora,Ravi Netravali,Miryung Kim,Guoqing Harry Xu

PROCEEDINGS OF THE 15TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDI '21)（2021）

引用 7|浏览70

暂无评分

摘要

Graph Neural Networks (GNNs) have gained significant attention in the recent past, and become one of the fastest growing subareas in deep learning. While several new GNN architectures have been proposed, the scale of real-world graphs-in many cases billions of nodes and edges-poses challenges during model training. In this paper, we present P-3, a system that focuses on scaling GNN model training to large real-world graphs in a distributed setting. We observe that scalability challenges in training GNNs are fundamentally different from that in training classical deep neural networks and distributed graph processing; and that commonly used techniques, such as intelligent partitioning of the graph do not yield desired results. Based on this observation, P-3 proposes a new approach for distributed GNN training. Our approach effectively eliminates high communication and partitioning overheads, and couples it with a new pipelined push-pull parallelism based execution strategy for fast model training. P-3 exposes a simple API that captures many different classes of GNN architectures for generality. When further combined with a simple caching strategy, our evaluation shows that P-3 is able to outperform existing state-of-the-art distributed GNN frameworks by up to 7x.

查看译文

关键词

Deep learning,Scalability,Distributed computing,Computer science,Generality,Scale (ratio),SIMPLE (military communications protocol),Parallelism (grammar),Artificial intelligence,Graph,Learning at scale

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要