C

Determination of the Optimal Number of Clusters: A Fuzzy-Set Based Method

Sy Dzung Nguyen, Vu Song Thuy Nguyen,Nhat Truong Pham

IEEE Transactions on Fuzzy Systems(2022)

引用 5|浏览7
暂无评分
摘要
The optimal number of clusters ( C opt ) is one of the determinants of clustering efficiency. In this article, we present a new method of quantifying C opt for centroid-based clustering. First, we propose a new clustering validity index named fRisk( C ) based on the fuzzy set theory. It takes the role of normalization and accumulation of local risks coming from each action either splitting data from a cluster or merging data into a cluster. fRisk( C ) exploits the local distribution information of the database to catch the global information of the clustering process in the form of the risk degree. Based on the monotonous reduction property of fRisk( C ), which is proved theoretically, we present a fRisk-based new algorithm named fRisk4-bA for determining C opt . In the algorithm, the well-known L-method is employed as a supplemented tool to catch C opt on the graph of the fRisk( C ). Along with the stable convergence trend of the method to be proved theoretically, numerical surveys are also carried out. The surveys show that the high reliability and stability, as well as the sensitivity in separating/merging clusters in high-density areas, even if the presence of noise in the databases, are the strong points of the proposed method.
更多
查看译文
关键词
Cluster validity,clustering validity index (CVI),evaluating clustering result,number of clusters
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要