Pardicle: Parallel Approximate Density-Based Clustering

SC(2014)

引用 40|浏览74
暂无评分
摘要
DBSCAN is a widely used isodensity-based clustering algorithm for particle data well-known for its ability to isolate arbitrarily-shaped clusters and to filter noise data. The algorithm is super-linear (O(nlogn)) and computationally expensive for large datasets. Given the need for speed, we propose a fast heuristic algorithm for DBSCAN using density based sampling, which performs equally well in quality compared to exact algorithms, but is more than an order of magnitude faster. Our experiments on astrophysics and synthetic massive datasets (8.5 billion numbers) shows that our approximate algorithm is up to 56x faster than exact algorithms with almost identical quality (Omega-Index >= 0.99). We develop a new parallel DBSCAN algorithm, which uses dynamic partitioning to improve load balancing and locality. We demonstrate near-linear speedup on shared memory (15x using 16 cores, single node Intel (R) Xeon (R) processor) and distributed memory (3917x using 4096 cores, multinode) computers, with 2x additional performance improvement using Intel (R) Xeon Phi (TM) coprocessors. Additionally, existing exact algorithms can achieve up to 3.4 times speedup using dynamic partitioning.
更多
查看译文
关键词
Density based clustering,approximate clustering algorithm,Union-Find algorithm,Disjoint-set data structure
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要