Approximate functional dependency discovery and one-dimensional selectivity estimation

Approximate functional dependency discovery and one-dimensional selectivity estimation(2004)

引用 24|浏览24
暂无评分
摘要
Tables are the fundamental data representation in modern database systems and one of the most fundamental representations in data mining and machine learning. Several researchers in the database community shown that Shannon entropy can be used naturally to quantify statistical properties of tables. One interesting property highlighted is that of an approximate functional dependency (AFD) which quantifies one-way a dependencies between columns. Entropy can also be used to quantify the “skewness” of the data in a column. Motivated by this work, we have examined two problems: (1) AFD discovery and (2) single dimensional selectivity estimation. Problem (1) is motivated by the fact that such discovered dependencies can provide valuable knowledge about the data. Quite a bit of research has gone into the discovery of functional dependencies (FDs), but not their a approximate counterparts. We take up the issue first by considering how to measure the degree to which an FD is approximate. We developed a set of axioms from intuition and prove that a unique measure satisfies them. Next we address the algorithmic problem of discovering all minimal AFDs that hold in a table. We develop and compare a level-wise and depth-first approach. Finally, we introduce an idea for applying discovered AFDs in relational query evaluation based on table decompositions. Problem (2) is motivated by the fact that modern relational database systems (e.g. Oracle, Access) estimate the result size (selectivity) of queries to aid in the search for efficient execution plans. A common approach is to build compact summary data structures and use these to estimate selectivity. A commonly used structure are histograms. We examine the use of “skewness quantification” with entropy in the construction of histograms. In particular we address the selectivity estimation problem for one commonly studied subclass of queries: single dimensional selection queries.
更多
查看译文
关键词
database community,selectivity estimation problem,data mining,single dimensional selectivity estimation,one-dimensional selectivity estimation,approximate functional dependency discovery,Shannon entropy,algorithmic problem,compact summary data structure,approximate functional dependency,fundamental data representation,approximate counterpart
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要