Early Classification: A New Heuristic to Improve the Classification Step of K-Means.

JIDM(2013)

引用 23|浏览25
暂无评分
摘要
Cluster analysis is the study of algorithms and techniques for grouping objects according to their intrinsic characteristics and similarity. A widely studied and popular clustering algorithm is K-Means, which is characterized by its ease of implementation and high computational cost. Although various performance improvements have been proposed for K-Means, the algorithm is still considered an expensive alternative for clustering large scale datasets. This work proposes a new heuristic for reducing the number of calculations needed in the classification step of K-Means, without high loss of quality reduction, by using statistical information about the displacement of centroids at each iteration. Our heuristic, called Early Classification (EC for short), identifies and excludes from future calculations those objects that, according to an equidistance threshold, have low likelihood of cluster change in subsequent iterations. To validate our proposal, a set of experiments is performed on synthetic and real-world datasets from the UCI Machine Learning repository. In addition, we compared our heuristic against two other state-of-the-art variations of K-Means. The Wilcoxon signed-rank test was used for calculating the statistical significance. The results are promising since the execution time of K-Means was reduced up to 82.49%, with a quality reduction of only 3.31%. Moreover, as the experiments will show, the superiority of our method is even more evident on large datasets.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要