Simple and Efficient Projective Clustering.
KDIR 2010: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND INFORMATION RETRIEVAL(2010)
摘要
We describe a new Monte Carlo algorithm for projective clustering that is both simple and efficient. Like previous Monte Carlo algorithms, we perform trials that sample a small subset of the data points to determine the dimensions in which the points are sufficiently close to form a cluster and then search the rest of the data for data points that are part of the cluster. However, our algorithm differs from previous algorithms in the method by which the dimensions of the cluster are determined and the method for determining the points in the cluster. This allows us to use smaller subsets of the data to determine the cluster dimensions and achieve improved efficiency over previous algorithms. The complexity of our algorithm is O(nd(1+log alpha/log beta)) where n is the number of data points, d is the number of dimensions in the space, and alpha and beta are parameters that specify which clusters should be found. To our knowledge, this is the lowest published complexity for an algorithm that is able to place high bounds on the probability of success. We present experiments that show that our algorithm outperforms previous algorithms on real and synthetic data.
更多查看译文
关键词
Projective clustering,Monte Carlo algorithm
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络