Lambda Means Clustering: Automatic Parameter Search And Distributed Computing Implementation

2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR)(2016)

引用 6|浏览18
暂无评分
摘要
Recent advances in clustering have shown that ensuring a minimum separation between cluster centroids leads to higher quality clusters compared to those found by methods that explicitly set the number of clusters to be found, such as k-means. One such algorithm is DP-means, which sets a distance parameter lambda for the minimum separation. However, without knowing either the true number of clusters or the underlying true distribution, setting lambda itself can be difficult, and poor choices in setting lambda will negatively impact cluster quality. As a general solution for finding lambda, in this paper we present lambda-means, a clustering algorithm capable of deriving an optimal value for lambda automatically. We contribute both a theoretically-motivated cluster-based version of lambda-means, as well as a faster conflict-based version of lambda-means. We demonstrate that lambda-means discovers the true underlying value of lambda asymptotically when run on datasets generated by a Dirichlet Process, and achieves competitive performance on a real world test dataset. Further, we demonstrate that when run on both parallel multicore computers and distributed cluster computers in the cloud, cluster-based lambda-means achieves near perfect speedup, and while being a more efficient algorithm, conflict-based lambda-means achieves speedups only a factor of two away from the maximum-possible.
更多
查看译文
关键词
cluster-based λ-means,distributed cluster computers,parallel multicore computers,Dirichlet process,clustering algorithm,cluster quality,DP-means,cluster centroids,distributed computing implementation,automatic parameter search,lambda means clustering
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要