$k$-Center Clustering with Outliers in the MPC and Streaming Model

arxiv(2023)

引用 0|浏览16
暂无评分
摘要
Given a point set $P \subseteq X$ of size $n$ in a metric space $(X,dist)$ of doubling dimension $d$ and two parameters $k \in N$ and $z \in N$, the $k$-center problem with $z$ outliers asks to return a set $C^\ast \subseteq X$ of $k$ centers such that the maximum distance of all but $z$ points of $P$ to their nearest center in $C^\ast$ is minimized. An $(\epsilon,k,z)$-coreset for this problem is a weighted point set $P^*$ such that an optimal solution for the $k$-center problem with $z$ outliers on $P^*$ gives a $(1\pm\epsilon)$-approximation for the $k$-center problem with $z$ outliers on $P$. We study the construction of such coresets in the Massively Parallel Computing (MPC) model, and in the insertion-only as well as the fully dynamic streaming model. We obtain the following results, for any given $0 < \epsilon \le 1$: In all cases, the size of the computed coreset is $O(k/\epsilon^d+z)$. - In the MPC model, we present a deterministic $2$-round and a randomized $1$-round algorithm. Additionally, we provide a deterministic algorithm that obtains a trade-off between the number of rounds, $R$, and the storage per machine. - For the insertion-only streaming model, we present an algorithm and a tight lower bound to support it. - We also discuss the dynamic streaming model, which allows both insertions and deletions in the data stream. In this model, we present the first algorithm and a lower bound. - Finally, we consider the sliding window model, where we are interested in maintaining an $(\epsilon,k,z)$-coreset for the last $W$ points in the stream, we present a tight lower bound that confirms the optimality of the previous work by De Berg, Monemizadeh, and Zhong (ESA2020).
更多
查看译文
关键词
outliers,mpc
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要