PK-Graph: Partitioned $$k^2$$-Trees to Enable Compact and Dynamic Graphs in Spark GraphX

Bruno Morais,Miguel E. Coimbra,Luís Veiga

Cooperative Information Systems Lecture Notes in Computer Science（2022）

引用 0|浏览5

暂无评分

摘要

Graphs are becoming increasingly larger, with datasets having millions of vertices and billions (or even trillions) of edges. As a result, the ability to fit the entire graph into the main memory of a single machine faces challenges in common hardware, even more so in edge/IoT-like devices (i.e., more energy efficient but also more resource constrained). Reading the graph from secondary storage may pose in itself significant overhead, negatively impacting query performance and storage requirements. It thus becomes relevant to explore techniques to optimize the storage of graphs, specially in memory, in a way that circumvents space limitations, while avoiding compromising the performance of processing. We observe that current graph storage systems manage the graph representation by storing graphs in an uncompressed format, either: i) in a shared architecture which leads to a higher space overhead and the inability to represent the graph entirely in main memory, or ii) in a distributed architecture, where the graph dataset is partitioned over a cluster of machines with each one storing in main memory only a fragment (shard) of the (uncompressed) graph. We present PK-GRAPH, our proposal which extends a distributed graph processing system, highly used in academia and industry (Spark GraphX), in order to deploy the use of a compressed graph representation, with added support for dynamic updatable graphs (not currently supported in GraphX). Our experimental results show that PK-GRAPH can achieve up to 50% lower graph memory usage, while maintaining competitive performance in executing typical graph operations used in common applications.

查看译文

关键词

Graph representation,Graph databases,Graph processing systems,Optimization,Compression

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要