Scaling Graph Traversal to 281 Trillion Edges with 40 Million Cores

Huanqi Cao,Yuanwei Wang,Haojie Wang,Heng Lin,Zixuan Ma,Wanwang Yin,Wenguang Chen

PPOPP'22: PROCEEDINGS OF THE 27TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING（2022）

引用 12|浏览38

暂无评分

摘要

Graph processing, especially high-performance graph traversal, plays a more and more important role in data analytics. The successor of Sunway TaihuLight, NEW SUNWAY, is equipped with nearly 10 PB memory and over 40 million cores, which brings the opportunity to process hundreds of trillions of edges graphs. However, the graph with an unprecedented scale also brings severe performance challenges, including load imbalance, poor locality, and irregular access of graph traversal workload. To address the scalability problem, we propose a novel 3-level degree-aware 1.5D graph partitioning, which benefits from both delegated 1D and 2D partitioning. By delegating extremely heavy vertices globally and other heavy vertices on columns and rows in the processes mesh, we break the scalability wall of previous partitioning methods. Together with sub-iteration direction optimization, core group -aware core subgraph segmenting, and a new on-chip sorting mechanism using RMA, we achieve 180,792 GTEPS on a graph with 281 trillion edges, using 103,912 processors with over 40 million cores, achieving 1.75x performance and 8x capacity compared to the previous state of the art and conforming to the Graph 500 BFS benchmark[14].

查看译文

关键词

massively parallel algorithm,breadth-first search,heterogeneous architecture

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要