A Scheduling Approach to Incremental Maintenance of Datalog Programs
international parallel and distributed processing symposium(2020)
摘要
In this paper, we study the problem of incremental maintenance of Datalog programs and model it as a scheduling problem on DAGs. We design provably good time- and memory-efficient scheduling algorithms for (re)executing a Datalog program where some (but not necessarily all) of the inputs have changed. We prove that our schedulers, called LevelBased and LevelBased with lookahead, have asymptotically improved running time and space efficiency when compared with benchmark algorithms used in production at LogicBlox.The main result of the paper is a hybrid scheduler, which combines LevelBased with the production LogicBlox scheduler (or any other heuristic scheduler). The hybrid scheduler achieves strong worst-case guarantees and robustness without losing out on the best-case behavior of the production LogicBlox scheduler. Our experiments show that the hybrid scheduler results in similar or improved total execution times compared to LogicBlox scheduler, while consistently reducing the scheduling overhead—by as much as 50% on some datasets. This hybrid scheme requires little to no overhead but provides predictability and reliability, which are crucial in a commercial application such as LogicBlox.
更多查看译文
关键词
Datalog programs,incremental maintenance,DAG scheduling,parallel task scheduling,databases,incremental computing,LogicBlox
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络