Optimization of the K-means Algorithm for the Solution of High Dimensional Instances

AIP Conference Proceedings(2016)

引用 6|浏览6
暂无评分
摘要
This paper addresses the problem of clustering instances with a high number of dimensions. In particular, a new heuristic for reducing the complexity of the K-means algorithm is proposed. Traditionally, there are two approaches that deal with the clustering of instances with high dimensionality. The first executes a preprocessing step to remove those attributes of limited importance. The second, called divide and conquer, creates subsets that are clustered separately and later their results are integrated through post-processing. In contrast, this paper proposes a new solution which consists of the reduction of distance calculations from the objects to the centroids at the classification step. This heuristic is derived from the visual observation of the clustering process of K-means, in which it was found that the objects can only migrate to adjacent clusters without crossing distant clusters. Therefore, this heuristic can significantly reduce the number of distance calculations from an object to the centroids of the potential clusters that it may be classified to. To validate the proposed heuristic, it was designed a set of experiments with synthetic and high dimensional instances. One of the most notable results was obtained for an instance of 25,000 objects and 200 dimensions, where its execution time was reduced up to 96.5% and the quality of the solution decreased by only 0.24% when compared to the K-means algorithm.
更多
查看译文
关键词
K-means,Complexity Reduction,High Dimensional Instances
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要