Exploring Mpi Collective I/O And File-Per-Process I/O For Checkpointing A Logical Inference Task

2021 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW)(2021)

引用 2|浏览8
暂无评分
摘要
We present a scalable parallel I/O system for a logical-inferencing application built atop a deductive database. Deductive databases can make logical deductions (i.e. conclude additional facts), based on a set of program rules, derived from facts already in the database. Datalog is a language or family of languages commonly used to specify rules and queries for a deductive database. Applications built using Datalog can range from graph mining (such as computing transitive closure or k-cliques) to program analysis (control and data-flow analysis). In our previous papers, we presented the first implementation of a data-parallel Datalog built using MPI. In this paper, we present a parallel I/O system used to checkpoint and restart applications built on top of our Datalog system. State of the art Datalog implementations, such as Souffle, only support serial I/O, mainly because the implementation itself does not support many-node parallel execution.Computing the transitive closure of a graph is one of the simplest logical-inferencing applications built using Datalog; we use it as a micro-benchmark to demonstrate the efficacy of our parallel I/O system. Internally, we use a nested B-tree data-structure to facilitate fast and efficient in-memory access to relational data. Our I/O system therefore involves two steps, converting the application data-layout (a nested B-tree) to a stream of bytes followed by the actual parallel I/O. We explore two popular I/O techniques POSIX I/O and MPI collective I/O. For extracting performance out of MPI Collective I/O we use adaptive striping, and for POSIX I/O we use file-per-process I/O. We demonstrate the scalability of our system at up to 4,096 processes on the Theta supercomputer at the Argonne National Laboratory.
更多
查看译文
关键词
nested B-tree data-structure,relational data,application data-layout,file-per-process,logical inference task,deductive database,logical deductions,program rules,language,queries,transitive closure,program analysis,data-flow analysis,data-parallel Datalog,Datalog system,parallel execution,Theta supercomputer,file-per-process I/O,checkpoint,graph mining,MPI
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要