# Foundations of Declarative Data Analysis Using Limit Datalog Programs

IJCAI, pp. 1123-1130, 2017.

EI

Keywords:

datum analysislimit datalogobject domaindeclarative datum analysisinformation system

Weibo:

Abstract:

Motivated by applications in declarative data analysis, we study $mathit{Datalog}_{mathbb{Z}}$---an extension of positive Datalog with arithmetic functions over integers. This language is known to be undecidable, so we propose two fragments. In $mathit{limit}~mathit{Datalog}_{mathbb{Z}}$ predicates are axiomatised to keep minimal/maximal ...More

Code:

Data:

Introduction

- The term ‘data analysis’ covers a broad range of techniques that often involve tasks such as data aggregation, property verification, or query answering.
- Such tasks are currently often solved imperatively by specifying how to manipulate the data, and this is undesirable because the objective of the analysis is often obscured by evaluation concerns.
- An evaluation strategy can be chosen later, and general parallel and/or incremental evaluation algorithms can be reused ‘for free’

Highlights

- Analysing complex datasets is currently a hot topic in information systems
- We prove that fact entailment in limit DatalogZ is undecidable, but, after restricting the use of multiplication, it becomes CONEXPTIME- and CONP-complete in combined and data complexity, respectively
- We introduce limit DatalogZ, where limit predicates keep bounds on numeric values
- In Section 1 we have shown that limit DatalogZ can compute the cost of shortest paths in a graph
- While certain forms of aggregation can be simulated by iterating over the object domain, as in our examples in Section 3, such a solution may be too cumbersome for practical use, and it relies on the existence of a linear order over the object domain, which is a strong theoretical assumption

Conclusion

**Conclusion and Future**

Work

The authors have introduced several decidable/tractable fragments of Datalog with integer arithmetic, obtaining a sound theoretical foundation for declarative data analysis.- Explicit support for aggregation would allow them to formulate tasks such as the ones in Section 3 more intuitively and without relying on the ordering assumption.
- It is unclear whether integer constraint solving is strictly needed in Step 7 of Algorithm 1: it may be possible to exploit stability of P to compute TP (J) more efficiently.
- It would be interesting to establish connections between the results and existing work on data-aware artefact systems [Damaggio et al, 2012; Koutsos and Vianu, 2017], which faces similar undecidability issues in a different formal setting

Summary

## Introduction:

The term ‘data analysis’ covers a broad range of techniques that often involve tasks such as data aggregation, property verification, or query answering.- Such tasks are currently often solved imperatively by specifying how to manipulate the data, and this is undesirable because the objective of the analysis is often obscured by evaluation concerns.
- An evaluation strategy can be chosen later, and general parallel and/or incremental evaluation algorithms can be reused ‘for free’
## Conclusion:

**Conclusion and Future**

Work

The authors have introduced several decidable/tractable fragments of Datalog with integer arithmetic, obtaining a sound theoretical foundation for declarative data analysis.- Explicit support for aggregation would allow them to formulate tasks such as the ones in Section 3 more intuitively and without relying on the ordering assumption.
- It is unclear whether integer constraint solving is strictly needed in Step 7 of Algorithm 1: it may be possible to exploit stability of P to compute TP (J) more efficiently.
- It would be interesting to establish connections between the results and existing work on data-aware artefact systems [Damaggio et al, 2012; Koutsos and Vianu, 2017], which faces similar undecidability issues in a different formal setting

Funding

- This research was supported by the Royal Society and the EPSRC projects DBOnto, MaSI3, and ED3

Reference

- [Alvaro et al., 2010] Peter Alvaro, Tyson Condie, Neil Conway, Khaled Elmeleegy, Joseph M. Hellerstein, and Russell Sears. BOOM analytics: exploring data-centric, declarative programming for the cloud. In EuroSys. ACM, 2010.
- [Beeri et al., 1991] Catriel Beeri, Shamim A. Naqvi, Oded Shmueli, and Shalom Tsur. Set constructors in a logic database language. J. Log. Program., 10(3&4), 1991.
- [Berman, 1980] Leonard Berman. The complexitiy of logical theories. Theor. Comput. Sci., 11, 1980.
- [Chin et al., 2015] Brian Chin, Daniel von Dincklage, Vuk Ercegovac, Peter Hawkins, Mark S. Miller, Franz Josef Och, Christopher Olston, and Fernando Pereira. Yedalog: Exploring knowledge at scale. In SNAPL, 2015.
- [Chistikov and Haase, 2016] Dmitry Chistikov and Christoph Haase. The taming of the semi-linear set. In ICALP, 2016.
- [Consens and Mendelzon, 1993] Mariano P. Consens and Alberto O. Mendelzon. Low complexity aggregation in GraphLog and Datalog. Theor. Comput. Sci., 116(1), 1993.
- [Damaggio et al., 2012] Elio Damaggio, Alin Deutsch, and Victor Vianu. Artifact systems with data dependencies and arithmetic. ACM Trans. Database Syst., 37(3):22:1–22:36, 2012.
- [Dantsin et al., 2001] Evgeny Dantsin, Thomas Eiter, Georg Gottlob, and Andrei Voronkov. Complexity and expressive power of logic programming. ACM Comput. Surv., 33(3), 2001.
- [Eisner and Filardo, 2011] Jason Eisner and Nathaniel Wesley Filardo. Dyna: Extending datalog for modern AI. In Datalog, 2011.
- [Faber et al., 2011] Wolfgang Faber, Gerald Pfeifer, and Nicola Leone. Semantics and complexity of recursive aggregates in answer set programming. Artif. Intell., 175(1), 2011.
- [Ganguly et al., 1995] Sumit Ganguly, Sergio Greco, and Carlo Zaniolo. Extrema predicates in deductive databases. J. Comput. Syst. Sci., 51(2), 1995.
- [Gradel, 1988] Erich Gradel. Subclasses of presburger arithmetic and the polynomial-time hierarchy. Theor. Comput. Sci., 56, 1988.
- [Haase, 2014] Christoph Haase. Subclasses of Presburger arithmetic and the weak EXP hierarchy. In CSL-LICS, 2014.
- [Kaminski et al., 2017] Mark Kaminski, Bernardo Cuenca Grau, Egor V. Kostylev, Boris Motik, and Ian Horrocks. Foundations of declarative data analysis using limit datalog programs. CoRR, abs/1705.06927, 2017.
- [Kemp and Stuckey, 1991] David B. Kemp and Peter J. Stuckey. Semantics of logic programs with aggregates. In ISLP, 1991.
- [Koutsos and Vianu, 2017] Adrien Koutsos and Victor Vianu. Process-centric views of data-driven business artifacts. J. Comput. System Sci., 86:82–107, 2017.
- [Loo et al., 2009] Boon Thau Loo, Tyson Condie, Minos N. Garofalakis, David E. Gay, Joseph M. Hellerstein, Petros Maniatis, Raghu Ramakrishnan, Timothy Roscoe, and Ion Stoica. Declarative networking. Commun. ACM, 52(11), 2009.
- [Markl, 2014] Volker Markl. Breaking the chains: On declarative data analysis and data independence in the big data era. PVLDB, 7(13), 2014.
- [Mazuran et al., 2013] Mirjana Mazuran, Edoardo Serra, and Carlo Zaniolo. Extending the power of datalog recursion. VLDB J., 22(4), 2013.
- [Mumick et al., 1990] Inderpal Singh Mumick, Hamid Pirahesh, and Raghu Ramakrishnan. The magic of duplicates and aggregates. In VLDB, pages 264–277, 1990.
- [Ross and Sagiv, 1997] Kenneth A. Ross and Yehoshua Sagiv. Monotonic aggregation in deductive databases. J. Comput. System Sci., 54(1), 1997.
- [Schoning, 1997] Uwe Schoning. Complexity of presburger arithmetic with fixed quantifier dimension. Theory Comput. Syst., 30(4), 1997.
- [Seo et al., 2015] Jiwon Seo, Stephen Guo, and Monica S. Lam. SociaLite: An efficient graph query language based on datalog. IEEE Trans. Knowl. Data Eng., 27(7), 2015.
- [Shkapsky et al., 2016] Alexander Shkapsky, Mohan Yang, Matteo Interlandi, Hsuan Chiu, Tyson Condie, and Carlo Zaniolo. Big data analytics with datalog queries on Spark. In SIGMOD. ACM, 2016.
- [Van Gelder, 1992] Allen Van Gelder. The well-founded semantics of aggregation. In PODS, 1992.
- [Wang et al., 2015] Jingjing Wang, Magdalena Balazinska, and Daniel Halperin. Asynchronous and fault-tolerant recursive datalog evaluation in shared-nothing engines. PVLDB, 8(12), 2015.

Best Paper

Best Paper of IJCAI, 2017

Tags

Comments