A Fast Granular-Ball-Based Density Peaks Clustering Algorithm for Large-Scale Data

IEEE transactions on neural networks and learning systems(2023)

引用 2|浏览18
暂无评分
摘要
Density peaks clustering algorithm (DP) has difficulty in clustering large-scale data, because it requires the distance matrix to compute the density and d-distance for each object, which has O(n(2)) time complexity. Granular ball (GB) is a coarse-grained representation of data. It is based on the fact that an object and its local neighbors have similar distribution and they have high possibility of belonging to the same class. It has been introduced into supervised learning by Xia et al. to improve the efficiency of supervised learning, such as support vector machine, k-nearest neighbor classification, rough set, etc. Inspired by the idea of GB, we introduce it into unsupervised learning for the first time and propose a GB-based DP algorithm, called GB-DP. First, it generates GBs from the original data with an unsupervised partitioning method. Then, it defines the density of GBs, instead of the density of objects, according to the centers, radius, and distances between its members and centers, without setting any parameters. After that, it computes the distance between the centers of GBs as the distance between GBs and defines the delta-distance of GBs. Finally, it uses GBs ' density and delta-distance to plot the decision graph, employs DP algorithm to cluster them, and expands the clustering result to the original data. Since there is no need to calculate the distance between any two objects and the number of GBs is far less than the scale of a data, it greatly reduces the running time of DP algorithm. By comparing with k-means, ball k-means, DP, DPC-KNN-PCA, FastDPeak, and DLORE-DP, GB-DP can get similar or even better clustering results in much less running time without setting any parameters. The source code is available at https://github.com/DongdongCheng/GB-DP.
更多
查看译文
关键词
Clustering,density peaks (DPs),granular balls (GBs),large scale
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要