Bd-Cats: Big Data Clustering At Trillion Particle Scale

Md. Mostofa Ali Patwary,Suren Byna,Nadathur Rajagopalan Satish,Narayanan Sundaram,Zarija Lukic,Vadim Roytershteyn,Michael J. Anderson,Yushu Yao, Prabhat,Pradeep Dubey

SC（2015）

引用 70|浏览150

暂无评分

摘要

Modern cosmology and plasma physics codes are now capable of simulating trillions of particles on petascale systems. Each timestep output from such simulations is on the order of 10s of TBs. Summarizing and analyzing raw particle data is challenging, and scientists often focus on density structures, whether in the real 3D space, or a high-dimensional phase space. In this work, we develop a highly scalable version of the clustering algorithm DBSCAN, and apply it to the largest datasets produced by state-of-the-art codes. Our system, called BD-CATS, is the first one capable of performing end-to-end analysis at trillion particle scale (including: loading the data, geometric partitioning, computing kd-trees, performing clustering analysis, and storing the results). We show analysis of 1.4 trillion particles from a plasma physics simulation, and a 10,240(3) particle cosmological simulation, utilizing similar to 100,000 cores in 30 minutes. BD-CATS is helping infer mechanisms behind particle acceleration in plasma physics and holds promise for qualitatively superior clustering in cosmology. Both of these results were previously intractable at the trillion particle scale.

查看译文

关键词

Density-based clustering,DBSCAN,Parallel I/O,KDTree

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要