UP-DPC: Ultra-scalable parallel density peak clustering

INFORMATION SCIENCES(2024)

引用 0|浏览3
暂无评分
摘要
Density Peak Clustering (DPC) is a highly effective density -based clustering algorithm, but its scalability is limited by the expensive Density Peak Estimation (DPE) step. To address this challenge, we propose UP-DPC: Ultra -Scalable Parallel Density Peak Clustering, a novel framework that employs approximate Density Peak Estimation and performs DPC on LDP -wise graphs. This approach enables UP-DPC to handle datasets of arbitrary scale without relying on spatial indexing for acceleration. Furthermore, we introduce a five -layer computational architecture and leverage parallel computation techniques to further enhance the speed and efficiency of UP-DPC. To evaluate the scalability and effectiveness of UP-DPC, we conduct extensive experiments on 14 datasets, including the large/web-scale datasets, and compare UP-DPC with 21 algorithms. Notably, on the MNIST8M dataset consisting of 8,000k data objects, UP-DPC achieves an NMI (Normalized Mutual Information) value of 0.6464 in just 35.41 seconds, outperforming the stateof-the-art GPU-based method, which only archives an NMI of 0.045 in 56.96 seconds. These results demonstrate the superior scalability and effectiveness of UP-DPC in handling large/webscale datasets. The proposed framework offers significant improvements over existing methods and shows promise as a solution for density -based clustering tasks.
更多
查看译文
关键词
Clustering,Density peak estimation,Large-scale,Scalability,Parallel computation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要