Efficient Correlation Clustering Methods for Large Consensus Clustering Instances

Nathan Cordner,George Kollios

CoRR(2023)

引用 0|浏览10
暂无评分
摘要
Consensus clustering (or clustering aggregation) inputs $k$ partitions of a given ground set $V$, and seeks to create a single partition that minimizes disagreement with all input partitions. State-of-the-art algorithms for consensus clustering are based on correlation clustering methods like the popular Pivot algorithm. Unfortunately these methods have not proved to be practical for consensus clustering instances where either $k$ or $V$ gets large. In this paper we provide practical run time improvements for correlation clustering solvers when $V$ is large. We reduce the time complexity of Pivot from $O(|V|^2 k)$ to $O(|V| k)$, and its space complexity from $O(|V|^2)$ to $O(|V| k)$ -- a significant savings since in practice $k$ is much less than $|V|$. We also analyze a sampling method for these algorithms when $k$ is large, bridging the gap between running Pivot on the full set of input partitions (an expected 1.57-approximation) and choosing a single input partition at random (an expected 2-approximation). We show experimentally that algorithms like Pivot do obtain quality clustering results in practice even on small samples of input partitions.
更多
查看译文
关键词
large consensus clustering,efficient correlation clustering methods
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要