Revisiting Conditional Functional Dependency Discovery: Splitting The "C" From The "Fd"
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2018, PT II(2018)
摘要
Many techniques for cleaning dirty data are based on enforcing some set of integrity constraints. Conditional functional dependencies (CFDs) are a combination of traditional Functional dependencies (FDs) and association rules, and are widely used as a constraint formalism for data cleaning. However, the discovery of such CFDs has received limited attention. In this paper, we regard CFDs as an extension of association rules, and present three general methodologies for (approximate) CFD discovery, each using a different way of combining pattern mining for discovering the conditions (the "C" in CFD) with FD discovery. We discuss how existing algorithms fit into these three methodologies, and introduce new techniques to improve the discovery process. We show that the right choice of methodology improves performance over the traditional CFD discovery method CTane. Code related to this paper is available at: https://github.com/j-r77/cfddiscovery, https://codeocean.com/2018/06/20/discovering-conditional-functional-dependencies/code.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络