Using internal validity measures to compare clustering algorithms

international conference on machine learning（2015）

引用 83|浏览7

暂无评分

摘要

Recently, significant effort has been made to automate the machine learning process in the context of supervised learning. This automation includes, amongst other things, the selection of an appropriate learning algorithm and corresponding hyperparameters for a particular learning problem. In contrast, such problems are much less studied for unsupervised tasks such as clustering. Nevertheless, users who want to cluster a data set are confronted with similar problems: a clustering algorithm should be selected from the wide variety of available algorithms, and usually some hyperparameters have to be set. In a supervised setting, model search is guided by performance measures that rely on known class labels, such as accuracy. However, these measures are not applicable to clustering as labels are usually not available. Instead, one might use internal validity measures that only rely on properties intrinsic to the data set. Several such measures are defined, and in this paper we study the usefulness of four of them for model selection. We perform experiments with these measures in combination with six clustering algorithms. While some measures are suited to use in hyperparameter optimization for some specific algorithms, we conclude that none of them is suited to compare across very different clustering algorithms.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要