ARGO: An Auto-Tuning Runtime System for Scalable GNN Training on Multi-Core Processor
CoRR(2024)
摘要
As Graph Neural Networks (GNNs) become popular, libraries like
PyTorch-Geometric (PyG) and Deep Graph Library (DGL) are proposed; these
libraries have emerged as the de facto standard for implementing GNNs because
they provide graph-oriented APIs and are purposefully designed to manage the
inherent sparsity and irregularity in graph structures. However, these
libraries show poor scalability on multi-core processors, which under-utilizes
the available platform resources and limits the performance. This is because
GNN training is a resource-intensive workload with high volume of irregular
data accessing, and existing libraries fail to utilize the memory bandwidth
efficiently. To address this challenge, we propose ARGO, a novel runtime system
for GNN training that offers scalable performance. ARGO exploits
multi-processing and core-binding techniques to improve platform resource
utilization. We further develop an auto-tuner that searches for the optimal
configuration for multi-processing and core-binding. The auto-tuner works
automatically, making it completely transparent from the user. Furthermore, the
auto-tuner allows ARGO to adapt to various platforms, GNN models, datasets,
etc. We evaluate ARGO on two representative GNN models and four widely-used
datasets on two platforms. With the proposed autotuner, ARGO is able to select
a near-optimal configuration by exploring only 5
speeds up state-of-the-art GNN libraries by up to 5.06x and 4.54x on a
four-socket Ice Lake machine with 112 cores and a two-socket Sapphire Rapids
machine with 64 cores, respectively. Finally, ARGO can seamlessly integrate
into widely-used GNN libraries (e.g., DGL, PyG) with few lines of code and
speed up GNN training.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要