Robust Convex Clustering Analysis

2016 IEEE 16th International Conference on Data Mining (ICDM)(2016)

引用 6|浏览41
暂无评分
摘要
Clustering is an unsupervised learning approach that explores data and seeks groups of similar objects. Many classical clustering models such as k-means and DBSCAN are based on heuristics algorithms and suffer from local optimal solutions and numerical instability. Recently convex clustering has received increasing attentions, which leverages the sparsity inducing norms and enjoys many attractive theoretical properties. However, convex clustering is based on Euclidean distance and is thus not robust against outlier features. Since the outlier features are very common especially when dimensionality is high, the vulnerability has greatly limited the applicability of convex clustering to analyze many real-world datasets. In this paper, we address the challenge by proposing a novel robust convex clustering method that simultaneously performs convex clustering and identifies outlier features. Specifically, the proposed method learns to decompose the data matrix into a clustering structure component and a group sparse component that captures feature outliers. We develop a block coordinate descent algorithm which iteratively performs convex clustering after outliers features are identified and eliminated. We also propose an efficient algorithm for solving the convex clustering by exploiting the structures on its dual problem. Moreover, to further illustrate the statistical stability, we present the theoretical performance bound of the proposed clustering method. Empirical studies on synthetic data and real-world data demonstrate that the proposed robust convex clustering can detect feature outliers as well as improve cluster quality.
更多
查看译文
关键词
robust convex clustering analysis,unsupervised learning approach,k-means clustering models,DBSCAN,heuristics algorithms,local optimal solutions,numerical instability,sparsity inducing norms,Euclidean distance,outlier features,data matrix decomposition,clustering structure component,group sparse component,feature outliers,block coordinate descent algorithm,statistical stability,theoretical performance bound
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要