DensEst: Density Estimation for Data Mining in High Dimensional Spaces

SDM(2009)

引用 50|浏览45
暂无评分
摘要
Subspace clustering and frequent itemset mining via "step- by-step" algorithms that search the subspace/pattern lattice in a top-down or bottom-up fashion do not scale to large high dimensional data bases. Recent "jump" algorithms directly choose candidate subspace regions or patterns. Their scalability and quality depend heavily on the rating of these candidates as mislead jumps incur poor results and costly candidate refinements. Existing techniques rely on simple statistics with low estimation quality or on inefficient data base scans. In this work, we propose DensEst, an efficient density estimator with significantly improved accuracy. It efficiently provides rough estimates of object counts in selective sub- space regions. Furthermore, by incorporating correlations between dimensions DensEst achieves not only efficient but also highly accurate estimations. We show how this den- sity estimation technique can be easily integrated into sub- space clustering and frequent itemset mining algorithms to improve both their efficiency and accuracy. We demonstrate the performance of our density estimation technique in thor- ough experiments and show its efficiency and accuracy im- provement for existing algorithms.
更多
查看译文
关键词
density estimation,bottom up,high dimensional data,top down,data mining
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要