Regression-based Optimizer Statistics generation based on q-error.

Big Data(2022)

引用 0|浏览7
暂无评分
摘要
Query optimizers in Relational Database Management Systems (RDBMS) rely on the approximations of data distributions of attributes, which are used for cardinality-based decisions like selectivity of a predicate or finding the right join order of a multiway join in a SQL query. The commercial databases try to reach best possible approximations of data distribution using compressed histograms. The state of the art in measurement of quality of a histogram is the q-error. However, one of the shortcomings of the measurement of quality of the histograms using q-error is that it pivots the entire quality on the worst case or the infinity norm of the multiplicative errors of the estimations. This may not be symbolic of the workload or the overall range of errors that would be representative of the system. In this paper, we propose a dynamic programming algorithm to achieve an optimal histogram structure by keeping the overall multiplicative errors low. We propose a one pass linear regression on the step function and apply dynamic programming methods to achieve an optimal histogram structure that optimizes the pair (slope β and intercept α). The algorithm achieves lower slope and intercept of the resultant regression on the sorted order of multiplicative errors, thus reducing the overall estimation errors for the system. We refer to this pair as the Q-Regression of a histogram. We then introduce the metric, QRegrArea as a new means to better quantify the cumulative distribution of errors for a given histogram. We provide the experimental validation of the proposed methods against the state-of the-art models in the literature and industry.
更多
查看译文
关键词
Database,Query Optimizers,Statistics,Histograms
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要