# Random Sample Partition-Based Clustering Ensemble Algorithm for Big Data.

IEEE BigData（2021）

Abstract

A novel random sample partition-based clustering ensemble (RSP-CE) algorithm is proposed in this paper to handle the big data clustering problems. There are three key components in RSP-CE algorithm, i.e., generating the base clustering results on RSP data blocks, harmonizing the based clustering results with maximum mean discrepancy (MMD) criterion, and refining the RSP clustering results. RSP data blocks have the consistent sample distributions with the whole big data and thus provide the possibility for using base clustering results on different data subsets to approximate the clustering result on whole big data. The experimental results in comparison with other 5 well-known clustering ensemble algorithms on 4 big data sets show that RSP-CE algorithm obtains the better normalized mutual information (NMI) values and Fowlkes-Mallows Index (FMI) values with the less training time consumptions and thus demonstrate that RSP-CE algorithm is a viable approach to deal with the big data clustering problems.

MoreTranslated text

Key words

Random sample partition,clustering ensemble,big data,maximum mean discrepancy

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined