WeDIV – An improved k-means clustering algorithm with a weighted distance and a novel internal validation index

Egyptian Informatics Journal(2022)

引用 1|浏览4
暂无评分
摘要
Designing appropriate similarity metrics (distance) and estimating the optimal number of clusters have been two important issues in cluster analysis. This study proposed an improved k-means clustering algo-rithm involving a Weighted Distance and a novel Internal Validation index (WeDIV). The weighted dis-tance, EP dis, was designed by considering the relative contribution between Euclidean and Pearson distances with a weighted strategy. This strategy can effectively capture information reflecting the glob-ally spatial correlation and locally variable trend simultaneously in high-dimensional space. The new internal validation index,RCH, inspired by the Calinski-Harabasz (CH) index and the analysis of variance, was developed to automatically estimate the optimal number of clusters. The EP dis was proved reliable in mathematics and was validated on two simulated datasets. Four simulated datasets representing dif-ferent properties were used to validate the effectiveness of RCH. Furthermore, We compared the cluster-ing performance of WeDIV with 12 prevailing clustering algorithms on 16 UCI datasets. The results demonstrated that WeDIV outperforms the others regardless of specifying the number of clusters or not.(c) 2022 THE AUTHORS. Published by Elsevier BV on behalf of Faculty of Computers and Artificial Intel-ligence, Cairo University. This is an open access article under the CC BY-NC-ND license (http://creative-commons.org/licenses/by-nc-nd/4.0/).
更多
查看译文
关键词
Clustering,EP dis,Weighted distance,RCH,Internal validation index
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要