Fast and Efficient Graph Traversal Algorithm for CPUs: Maximizing Single-Node Efficiency

IPDPS(2012)

引用 117|浏览72
暂无评分
摘要
Graph-based structures are being increasingly used to model data and relations among data in a number of fields. Graph-based databases are becoming more popular as a means to better represent such data. Graph traversal is a key component in graph algorithms such as reach ability and graph matching. Since the scale of data stored and queried in these databases is increasing, it is important to obtain high performing implementations of graph traversal that can efficiently utilize the processing power of modern processors. In this work, we present a scalable Breadth-First Search Traversal algorithm for modern multi-socket, multi-core CPUs. Our algorithm uses lock- and atomic-free operations on a cache-resident structure for arbitrary sized graphs to filter out expensive main memory accesses, and completely and efficiently utilizes all available bandwidth resources. We propose a work distribution approach for multi-socket platforms that ensures load-balancing while keeping cross-socket communication low. We provide a detailed analytical model that accurately projects the performance of our single- and multi-socket traversal algorithms to within 5-10% of obtained performance. Our analytical model serves as a useful tool to analyze performance bottlenecks on modern CPUs. When measured on various synthetic and real-world graphs with a wide range of graph sizes, vertex degrees and graph diameters, our implementation on a dual-socket Intel (R) Xeon (R) X5570 (Intel micro architecture code name Nehalem) system achieves 1.5X -- 13.2X performance speedup over the best reported numbers. We achieve around 1 Billion traversed edges per second on a scale free R-MAT graph with 64M vertices and 2 Billion edges on a dual-socket Nehalem system. Our optimized algorithm is useful as a building block for efficient multi-node implementations and future exascale systems, thereby allowing them to ride the trend of increasing per-node compute and bandwidth resources.
更多
查看译文
关键词
efficient graph traversal algorithm,performance speedup,model data,graph size,analytical model,performance bottleneck,graph diameter,scale free r-mat graph,real-world graph,graph traversal,maximizing single-node efficiency,graph matching,data structures,resource allocation,complex networks,load balancing,reachability,bandwidth,data representation,instruction sets,data modeling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要