Functional Dependencies with Predicates: What Makes the $g_3$-error Easy to Compute?

Simon Vilmin,Pierre Faure--Giovagnoli,Jean-Marc Petit,Vasile-Marian Scuturici

arXiv (Cornell University)（2023）

引用 3|浏览12

暂无评分

摘要

The notion of functional dependencies (FDs) can be used by data scientists and domain experts to confront background knowledge against data. To overcome the classical, too restrictive, satisfaction of FDs, it is possible to replace equality with more meaningful binary predicates, and use a coverage measure such as the $g_3$-error to estimate the degree to which a FD matches the data. It is known that the $g_3$-error can be computed in polynomial time if equality is used, but unfortunately, the problem becomes NP-complete when relying on more general predicates instead. However, there has been no analysis of which class of predicates or which properties alter the complexity of the problem, especially when going from equality to more general predicates. In this work, we provide such an analysis. We focus on the properties of commonly used predicates such as equality, similarity relations, and partial orders. These properties are: reflexivity, transitivity, symmetry, and antisymmetry. We show that symmetry and transitivity together are sufficient to guarantee that the $g_3$-error can be computed in polynomial time. However, dropping either of them makes the problem NP-complete.

查看译文

关键词

predicates

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要