Spectral clustering with limited independence
SODA(2007)
摘要
This paper considers the well-studied problem of clustering a set of objects under a probabilistic model of data in which each object is represented as a vector over the set of features, and there are only k different types of objects. In general, earlier results (mixture models and "planted" problems on graphs) often assumed that all coordinates of all objects are independent random variables. They then appeal to the theory of random matrices in order to infer spectral properties of the feature x object matrix. However, in most practical applications, assuming full independence is not realistic. Instead, we only assume that the objects are independent, but the coordinates of each object may not be. We first generalize the required results for random matrices to this case of limited independence using some new techniques developed in Functional Analysis. Surprisingly, we are able to prove results that are quite similar to the fully independent case modulo an extra logarithmic factor. Using these bounds, we develop clustering algorithms for the more general mixture models. Our clustering algorithms have a substantially different and perhaps simpler "clean-up" phase than known algorithms. We show that our model subsumes not only the planted partition random graph models, but also another set of models under which there is a body of clustering algorithms, namely the Gaussian and log-concave mixture models.
更多查看译文
关键词
general mixture model,limited independence,object matrix,spectral clustering,full independence,log-concave mixture model,random matrix,mixture model,partition random graph model,clustering algorithm,independent random variable,independent case modulo,probabilistic model,random matrices,independent set,functional analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络