An efficient density-based clustering for multi-dimensional database

2017 4th International Conference on Information, Cybernetics and Computational Social Systems (ICCSS)(2017)

引用 0|浏览0
暂无评分
摘要
Cluster analysis aims at classifying data elements into different categories according to their similarity. It is a common task in data mining and useful in various field including pattern recognition, machine learning, information retrieval and so on. As an extensive studied area, many clustering methods are proposed in literature. Among them, some methods are focused on mining clusters with arbitrary shapes. However, when dealing with large-scale and multi-dimensional data, there is still a need for an efficient and versatile clustering method to identify these arbitrary shapes that may be embedded in these multi-dimensional space. In this paper, we propose a density-based clustering algorithm that adopts a divide-and-conquer strategy. To handle large-scale and multi-dimensional data, we first divide the data by grid cells. It is very efficient in large-scale cases where other algorithms often fail. Moreover, rather than tuning the grid cell width, we present a way to automatically determine the grid cell width. Then, we propose a flood-filling like algorithm to identify the clusters with arbitrary shapes over these grid cells. Finally, extensive experiments are conducted in both synthetic databases and real-world databases, showing that the proposed algorithm efficiently finds accurate clusters in both low-dimensional and multi-dimensional databases.
更多
查看译文
关键词
Clustering,Grid,KNN Graph,Large-scale data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要