Distributed Density Peaks Clustering Revisited

IEEE Transactions on Knowledge and Data Engineering(2022)

引用 15|浏览79
Density Peaks (DP) Clustering organizes data into clusters by finding peaks in dense regions. This involves computing density ( $\rho$ ) and distance ( $\delta$ ) of every point. As such, though DP has been very effective in producing high quality clusters, their complexity is O( $N^2$ ) where $N$ is the number of data points. In this paper, we propose a fast distributed density peaks clustering algorithm, FDDP, based on the z-value index. In FDDP, we first employ the z-value index to map multi-dimensional data points into one dimensional space, and then range-partition the data according to the z-value to balance the load across the processing nodes. We ensure minimal overlapping range to handle computations at the boundary points. We also propose FC, an efficient algorithm that employs a forward computing strategy to calculate $\rho$ linearly. Additionally, we propose another algorithm, CB, which uses a caching and efficient searching strategy to compute $\delta$ . Moreover, FDDP is able to reduce the time complexity from $O(N^2)$ to $O(N\cdot log(N))$ . We provide a theoretical analysis of FDDP and evaluated FDDP empirically. Our experimental results show that FDDP outperforms the state-of-the-art algorithms significantly.
Clustering,distributed computing,z-order curve,density peaks clustering
AI 理解论文