Cardinality Estimation Based on Cluster Analysis.

JOURNAL OF INFORMATION SCIENCE AND ENGINEERING(2019)

引用 0|浏览27
暂无评分
摘要
In the cardinality estimation solutions based on multi-dimensional self-tuning histograms, periodical data scans are avoided and self-tuning histograms are constructed according to query feedback records (QFRs). We call this kind of cardinality estimation solutions the reactive solutions. In the existing reactive solutions, self-tuning histograms are constructed over the entire value ranges of the queried attributes and have large scales. The bucket number of a multi-dimensional self-tuning histogram increases exponentially with the dimension. That means the existing reactive solutions are stuck with the issue of "curse of dimension". Simultaneously, to construct and maintain a multi-dimensional self-tuning histogram over the entire value ranges of the queried attributes, a large number of QFRs must be accumulated at a long time span, and the cumbersome operations have to be executed repeatedly to meet a space budget, which makes the existing reactive solutions unpredictable and time-consuming. To address above issues, a new reactive solution is proposed in the paper. In the solution, the global self-tuning histogram covering the entire value range of the queried attributes is abandoned. When a new predicate p is executed, the Ward's minimum variance method is used to find k nearest QFRs with p from the QFR warehouse. Based on the found k QFRs, a micro self-tuning histogram only covering the neighborhood of p is constructed to help estimate the cardinality of p. The solution can be considered as a beneficial attempt to improve the cardinality estimation efficiency under high dimensions, and notably alleviate the issue of "curse of dimension". Furthermore, old QFRs can be replaced rapidly by the new QFRs and the data and workload changes can be timely reflected by micro self-tuning histograms. The process of meeting a space budget is eliminated completely, which makes the whole solution reliable and dexterous.
更多
查看译文
关键词
cardinality estimation,cluster analysis,ward's minimum variance method,self-tuning,query feedback record
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要