High-Performance Data Analytics Beyond The Relational And Graph Data Models With Gems

2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW)(2017)

引用 7|浏览83
暂无评分
摘要
Graphs represent an increasingly popular data model for data-analytics, since they can naturally represent relationships and interactions between entities. Relational databases and their pure table-based data model are not well suitable to store and process sparse data. Consequently, graph databases have gained interest in the last few years and the Resource Description Framework (RDF) became the standard data model for graph data. Nevertheless, while RDF is well suited to analyze the relationships between the entities, it is not efficient in representing their attributes and properties. In this work we propose the adoption of a new hybrid data model, based on attributed graphs, that aims at overcoming the limitations of the pure relational and graph data models. We present how we have re-designed the GEMS data-analytics framework to fully take advantage of the proposed hybrid data model. To improve analysts productivity, in addition to a C++ API for applications development, we adopt GraQL as input query language. We validate our approach implementing a set of queries on net-flow data and we compare our framework performance against Neo4j. Experimental results show significant performance improvement over Neo4j, up to several orders of magnitude when increasing the size of the input data.
更多
查看译文
关键词
Data Analytics,Graph Database,Big Data,Graphs,High Performance Computing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要