Fast Hierarchical Clustering Based On Compressed Data And Optics

PKDD '00: Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery(2000)

引用 23|浏览30
暂无评分
摘要
One way to scale up clustering algorithms is to squash the data by some intelligent compression technique and cluster only the compressed data records. Such compressed data records can e.g. be produced by the BIRCH algorithm. Typically they consist of the sufficient statistics of the form (N, X, X-2 ) where N is the number of points, X is the (vector-)sum, and X-2 is the square sum of the points. They can be used directly to speed up k-means type of clustering algorithms, but it is not obvious how to use them in a hierarchical clustering algorithm. Applying a hierarchical clustering algorithm e.g. to the centers of compressed subclusters produces a very weak result. The reason is that hierarchical clustering algorithms are based on the distances between data points and that the interpretaion of the result relies heavily on a correct graphical representation of these distances. In this paper, we introduce a method by which the sufficient statistics (N, X, X-2 ) of subcluster can be utilized in the hierarchical clustering method OPTICS. We show how to generate appropriate distance information about compressed data points, and how to adapt the graphical representation of the clustering result. A performance evaluation using OPTICS in combination with BIRCH demonstrates that our approach is extremely efficient (speed-up factors up to 1700) and produces high quality results.
更多
查看译文
关键词
hierarchical clustering algorithm,clustering algorithm,data point,data record,sufficient statistic,clustering result,hierarchical clustering method,BIRCH algorithm,high quality result,weak result,Compressed Data,Fast Hierarchical
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要