Foundations of Declarative Data Analysis Using Limit Datalog Programs

IJCAI, pp. 1123-1130, 2017.

Cited by: 8|Bibtex|Views173|Links
EI
Keywords:
datum analysislimit datalogobject domaindeclarative datum analysisinformation system
Weibo:
While certain forms of aggregation can be simulated by iterating over the object domain, as in our examples in Section 3, such a solution may be too cumbersome for practical use, and it relies on the existence of a linear order over the object domain, which is a strong theoretica...

Abstract:

Motivated by applications in declarative data analysis, we study $mathit{Datalog}_{mathbb{Z}}$---an extension of positive Datalog with arithmetic functions over integers. This language is known to be undecidable, so we propose two fragments. In $mathit{limit}~mathit{Datalog}_{mathbb{Z}}$ predicates are axiomatised to keep minimal/maximal ...More

Code:

Data:

Introduction
  • The term ‘data analysis’ covers a broad range of techniques that often involve tasks such as data aggregation, property verification, or query answering.
  • Such tasks are currently often solved imperatively by specifying how to manipulate the data, and this is undesirable because the objective of the analysis is often obscured by evaluation concerns.
  • An evaluation strategy can be chosen later, and general parallel and/or incremental evaluation algorithms can be reused ‘for free’
Highlights
  • Analysing complex datasets is currently a hot topic in information systems
  • We prove that fact entailment in limit DatalogZ is undecidable, but, after restricting the use of multiplication, it becomes CONEXPTIME- and CONP-complete in combined and data complexity, respectively
  • We introduce limit DatalogZ, where limit predicates keep bounds on numeric values
  • In Section 1 we have shown that limit DatalogZ can compute the cost of shortest paths in a graph
  • While certain forms of aggregation can be simulated by iterating over the object domain, as in our examples in Section 3, such a solution may be too cumbersome for practical use, and it relies on the existence of a linear order over the object domain, which is a strong theoretical assumption
Conclusion
  • Conclusion and Future

    Work

    The authors have introduced several decidable/tractable fragments of Datalog with integer arithmetic, obtaining a sound theoretical foundation for declarative data analysis.
  • Explicit support for aggregation would allow them to formulate tasks such as the ones in Section 3 more intuitively and without relying on the ordering assumption.
  • It is unclear whether integer constraint solving is strictly needed in Step 7 of Algorithm 1: it may be possible to exploit stability of P to compute TP (J) more efficiently.
  • It would be interesting to establish connections between the results and existing work on data-aware artefact systems [Damaggio et al, 2012; Koutsos and Vianu, 2017], which faces similar undecidability issues in a different formal setting
Summary
  • Introduction:

    The term ‘data analysis’ covers a broad range of techniques that often involve tasks such as data aggregation, property verification, or query answering.
  • Such tasks are currently often solved imperatively by specifying how to manipulate the data, and this is undesirable because the objective of the analysis is often obscured by evaluation concerns.
  • An evaluation strategy can be chosen later, and general parallel and/or incremental evaluation algorithms can be reused ‘for free’
  • Conclusion:

    Conclusion and Future

    Work

    The authors have introduced several decidable/tractable fragments of Datalog with integer arithmetic, obtaining a sound theoretical foundation for declarative data analysis.
  • Explicit support for aggregation would allow them to formulate tasks such as the ones in Section 3 more intuitively and without relying on the ordering assumption.
  • It is unclear whether integer constraint solving is strictly needed in Step 7 of Algorithm 1: it may be possible to exploit stability of P to compute TP (J) more efficiently.
  • It would be interesting to establish connections between the results and existing work on data-aware artefact systems [Damaggio et al, 2012; Koutsos and Vianu, 2017], which faces similar undecidability issues in a different formal setting
Funding
  • This research was supported by the Royal Society and the EPSRC projects DBOnto, MaSI3, and ED3
Reference
  • [Alvaro et al., 2010] Peter Alvaro, Tyson Condie, Neil Conway, Khaled Elmeleegy, Joseph M. Hellerstein, and Russell Sears. BOOM analytics: exploring data-centric, declarative programming for the cloud. In EuroSys. ACM, 2010.
    Google ScholarFindings
  • [Beeri et al., 1991] Catriel Beeri, Shamim A. Naqvi, Oded Shmueli, and Shalom Tsur. Set constructors in a logic database language. J. Log. Program., 10(3&4), 1991.
    Google ScholarLocate open access versionFindings
  • [Berman, 1980] Leonard Berman. The complexitiy of logical theories. Theor. Comput. Sci., 11, 1980.
    Google ScholarLocate open access versionFindings
  • [Chin et al., 2015] Brian Chin, Daniel von Dincklage, Vuk Ercegovac, Peter Hawkins, Mark S. Miller, Franz Josef Och, Christopher Olston, and Fernando Pereira. Yedalog: Exploring knowledge at scale. In SNAPL, 2015.
    Google ScholarLocate open access versionFindings
  • [Chistikov and Haase, 2016] Dmitry Chistikov and Christoph Haase. The taming of the semi-linear set. In ICALP, 2016.
    Google ScholarLocate open access versionFindings
  • [Consens and Mendelzon, 1993] Mariano P. Consens and Alberto O. Mendelzon. Low complexity aggregation in GraphLog and Datalog. Theor. Comput. Sci., 116(1), 1993.
    Google ScholarLocate open access versionFindings
  • [Damaggio et al., 2012] Elio Damaggio, Alin Deutsch, and Victor Vianu. Artifact systems with data dependencies and arithmetic. ACM Trans. Database Syst., 37(3):22:1–22:36, 2012.
    Google ScholarLocate open access versionFindings
  • [Dantsin et al., 2001] Evgeny Dantsin, Thomas Eiter, Georg Gottlob, and Andrei Voronkov. Complexity and expressive power of logic programming. ACM Comput. Surv., 33(3), 2001.
    Google ScholarLocate open access versionFindings
  • [Eisner and Filardo, 2011] Jason Eisner and Nathaniel Wesley Filardo. Dyna: Extending datalog for modern AI. In Datalog, 2011.
    Google ScholarLocate open access versionFindings
  • [Faber et al., 2011] Wolfgang Faber, Gerald Pfeifer, and Nicola Leone. Semantics and complexity of recursive aggregates in answer set programming. Artif. Intell., 175(1), 2011.
    Google ScholarLocate open access versionFindings
  • [Ganguly et al., 1995] Sumit Ganguly, Sergio Greco, and Carlo Zaniolo. Extrema predicates in deductive databases. J. Comput. Syst. Sci., 51(2), 1995.
    Google ScholarLocate open access versionFindings
  • [Gradel, 1988] Erich Gradel. Subclasses of presburger arithmetic and the polynomial-time hierarchy. Theor. Comput. Sci., 56, 1988.
    Google ScholarLocate open access versionFindings
  • [Haase, 2014] Christoph Haase. Subclasses of Presburger arithmetic and the weak EXP hierarchy. In CSL-LICS, 2014.
    Google ScholarLocate open access versionFindings
  • [Kaminski et al., 2017] Mark Kaminski, Bernardo Cuenca Grau, Egor V. Kostylev, Boris Motik, and Ian Horrocks. Foundations of declarative data analysis using limit datalog programs. CoRR, abs/1705.06927, 2017.
    Findings
  • [Kemp and Stuckey, 1991] David B. Kemp and Peter J. Stuckey. Semantics of logic programs with aggregates. In ISLP, 1991.
    Google ScholarLocate open access versionFindings
  • [Koutsos and Vianu, 2017] Adrien Koutsos and Victor Vianu. Process-centric views of data-driven business artifacts. J. Comput. System Sci., 86:82–107, 2017.
    Google ScholarLocate open access versionFindings
  • [Loo et al., 2009] Boon Thau Loo, Tyson Condie, Minos N. Garofalakis, David E. Gay, Joseph M. Hellerstein, Petros Maniatis, Raghu Ramakrishnan, Timothy Roscoe, and Ion Stoica. Declarative networking. Commun. ACM, 52(11), 2009.
    Google ScholarLocate open access versionFindings
  • [Markl, 2014] Volker Markl. Breaking the chains: On declarative data analysis and data independence in the big data era. PVLDB, 7(13), 2014.
    Google ScholarLocate open access versionFindings
  • [Mazuran et al., 2013] Mirjana Mazuran, Edoardo Serra, and Carlo Zaniolo. Extending the power of datalog recursion. VLDB J., 22(4), 2013.
    Google ScholarLocate open access versionFindings
  • [Mumick et al., 1990] Inderpal Singh Mumick, Hamid Pirahesh, and Raghu Ramakrishnan. The magic of duplicates and aggregates. In VLDB, pages 264–277, 1990.
    Google ScholarLocate open access versionFindings
  • [Ross and Sagiv, 1997] Kenneth A. Ross and Yehoshua Sagiv. Monotonic aggregation in deductive databases. J. Comput. System Sci., 54(1), 1997.
    Google ScholarLocate open access versionFindings
  • [Schoning, 1997] Uwe Schoning. Complexity of presburger arithmetic with fixed quantifier dimension. Theory Comput. Syst., 30(4), 1997.
    Google ScholarLocate open access versionFindings
  • [Seo et al., 2015] Jiwon Seo, Stephen Guo, and Monica S. Lam. SociaLite: An efficient graph query language based on datalog. IEEE Trans. Knowl. Data Eng., 27(7), 2015.
    Google ScholarLocate open access versionFindings
  • [Shkapsky et al., 2016] Alexander Shkapsky, Mohan Yang, Matteo Interlandi, Hsuan Chiu, Tyson Condie, and Carlo Zaniolo. Big data analytics with datalog queries on Spark. In SIGMOD. ACM, 2016.
    Google ScholarLocate open access versionFindings
  • [Van Gelder, 1992] Allen Van Gelder. The well-founded semantics of aggregation. In PODS, 1992.
    Google ScholarLocate open access versionFindings
  • [Wang et al., 2015] Jingjing Wang, Magdalena Balazinska, and Daniel Halperin. Asynchronous and fault-tolerant recursive datalog evaluation in shared-nothing engines. PVLDB, 8(12), 2015.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Best Paper
Best Paper of IJCAI, 2017
Tags
Comments