ABC of order dependencies

The VLDB Journal(2021)

引用 4|浏览17
暂无评分
摘要
Band order dependencies (ODs) enhance constraint-based data quality by modeling the semantics of attributes that are monotonically related to small variations without an intrinsic violation of semantics. The class of approximate band conditional ODs (abcODs) generalizes band ODs to make them more relevant to real-world applications by relaxing them to hold approximately with some exceptions (abODs) and conditionally on subsets of the data. We study the automatic dependency discovery of abcODs to avoid human burden. First, we propose a more efficient algorithm to discover abODs than in recent prior work that is based on a new optimization to compute a longest monotonic band via dynamic programming and decreases the runtime from O(n^2) to O(n log n) . We then devise a dynamic programming algorithm for abcOD discovery that determines the optimal solution in polynomial time. To optimize the performance (without losing optimality), we adapt the algorithm to cheaply identify consecutive tuples that are guaranteed to belong to the same band. For generality, we extend our algorithms to discover bidirectional abcODs. Finally, we perform a thorough experimental evaluation of our techniques over real-world and synthetic datasets.
更多
查看译文
关键词
Data quality, Data profiling, Data discovery
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要