DFD: Efficient Functional Dependency Discovery.
CIKM '14: 2014 ACM Conference on Information and Knowledge Management Shanghai China November, 2014(2014)
摘要
The discovery of unknown functional dependencies in a dataset is of great importance for database redesign, anomaly detection and data cleansing applications. However, as the nature of the problem is exponential in the number of attributes none of the existing approaches can be applied on large datasets. We present a new algorithm DFD for discovering all functional dependencies in a dataset following a depth-first traversal strategy of the attribute lattice that combines aggressive pruning and efficient result verification. Our approach is able to scale far beyond existing algorithms for up to 7.5 million tuples, and is up to three orders of magnitude faster than existing approaches on smaller datasets.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络