Approximation Schemes for Clustering with Outliers

Society for Industrial and Applied Mathematics eBooks(2017)

引用 46|浏览143
暂无评分
摘要
Clustering problems are well-studied in a variety of fields such as data science, operations research, and computer science. Such problems include variants of centre location problems, $k$-median, and $k$-means to name a few. In some cases, not all data points need to be clustered; some may be discarded for various reasons. We study clustering problems with outliers. More specifically, we look at Uncapacitated Facility Location (UFL), $k$-Median, and $k$-Means. In UFL with outliers, we have to open some centres, discard up to $z$ points of $\cal X$ and assign every other point to the nearest open centre, minimizing the total assignment cost plus centre opening costs. In $k$-Median and $k$-Means, we have to open up to $k$ centres but there are no opening costs. In $k$-Means, the cost of assigning $j$ to $i$ is $\delta^2(j,i)$. We present several results. Our main focus is on cases where $\delta$ is a doubling metric or is the shortest path metrics of graphs from a minor-closed family of graphs. For uniform-cost UFL with outliers on such metrics we show that a multiswap simple local search heuristic yields a PTAS. With a bit more work, we extend this to bicriteria approximations for the $k$-Median and $k$-Means problems in the same metrics where, for any constant $\epsilon > 0$, we can find a solution using $(1+\epsilon)k$ centres whose cost is at most a $(1+\epsilon)$-factor of the optimum and uses at most $z$ outliers. We also show that natural local search heuristics that do not violate the number of clusters and outliers for $k$-Median (or $k$-Means) will have unbounded gap even in Euclidean metrics. Furthermore, we show how our analysis can be extended to general metrics for $k$-Means with outliers to obtain a $(25+\epsilon,1+\epsilon)$ bicriteria.
更多
查看译文
关键词
k-MEANS,outliers,local search
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要