# Distributed Density Peaks Clustering Revisited

IEEE Transactions on Knowledge and Data Engineering（2022）

摘要

Density Peaks (DP) Clustering organizes data into clusters by finding peaks in dense regions. This involves computing density (
$\rho$
) and distance (
$\delta$
) of every point. As such, though DP has been very effective in producing high quality clusters, their complexity is O(
$N^2$
) where
$N$
is the number of data points. In this paper, we propose a fast distributed density peaks clustering algorithm, FDDP, based on the z-value index. In FDDP, we first employ the z-value index to map multi-dimensional data points into one dimensional space, and then range-partition the data according to the z-value to balance the load across the processing nodes. We ensure minimal overlapping range to handle computations at the boundary points. We also propose FC, an efficient algorithm that employs a forward computing strategy to calculate
$\rho$
linearly. Additionally, we propose another algorithm, CB, which uses a caching and efficient searching strategy to compute
$\delta$
. Moreover, FDDP is able to reduce the time complexity from
$O(N^2)$
to
$O(N\cdot log(N))$
. We provide a theoretical analysis of FDDP and evaluated FDDP empirically. Our experimental results show that FDDP outperforms the state-of-the-art algorithms significantly.

更多查看译文

关键词

Clustering,distributed computing,z-order curve,density peaks clustering

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络