On Euclidean $k$-Means Clustering with $\alpha$-Center Proximity.

arXiv: Data Structures and Algorithms(2018)

引用 23|浏览20
暂无评分
摘要
$k$-means clustering is NP-hard in the worst case but previous work has shown efficient algorithms assuming the optimal $k$-means clusters are emph{stable} under additive or multiplicative perturbation of data. This has two caveats. First, we do not know how to efficiently verify this property of optimal solutions that are NP-hard to compute in the first place. Second, the stability assumptions required for polynomial time $k$-means algorithms are often unreasonable when compared to the ground-truth clusters in real-world data. A consequence of multiplicative perturbation resilience is emph{center proximity}, that is, every point is closer to the center of its own cluster than the center of any other cluster, by some multiplicative factor $alpha u003e 1$.We study the problem of minimizing the Euclidean $k$-means objective only over clusterings that satisfy $alpha$-center proximity. We give a simple algorithm to find the optimal $alpha$-center-proximal $k$-means clustering in running time exponential in $k$ and $1/(alpha - 1)$ but linear in the number of points and the dimension. We define an analogous $alpha$-center proximity condition for outliers, and give similar algorithmic guarantees for $k$-means with outliers and $alpha$-center proximity. On the hardness side we show that for any $alphau0027 u003e 1$, there exists an $alpha leq alphau0027$, $(alpha u003e1)$, and an $e_0 u003e 0$ such that minimizing the $k$-means objective over clusterings that satisfy $alpha$-center proximity is NP-hard to approximate within a multiplicative $(1+e_0)$ factor.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要