Learning Mixtures of Product Distributions Using Correlations and Independence

COLT(2008)

引用 85|浏览22
暂无评分
摘要
We study the problem of learning mixtures of distributions, a natural formalization of clus- tering. A mixture of distributions is a col- lection of distributions D = {D1, . . . DT }, and mixing weights, {w1, . . . , wT } such that P i wi = 1. A sample from a mixture is gen- erated by choosing i with probability wi and then choosing a sample from distribution Di. The problem of learning the mixture is that of finding the parameters of the distributions comprising D, given only the ability to sam- ple from the mixture. In this paper, we restrict ourselves to learning mixtures of product dis- tributions. The key to learning the mixtures is to find a few vectors, such that points from different distributions are sharply separated upon pro- jection onto these vectors. Previous techniques use the vectors corresponding to the top few directions of highest variance of the mixture. Unfortunately, these directions may be direc- tions of high noise and not directions along which the distributions are separated. Further, skewed mixing weights amplify the effects of noise, and as a result, previous techniques only work when the separation between the input distributions is large relative to the imbalance in the mixing weights. In this paper, we show an algorithm which successfully learns mixtures of distributions with a separation condition that depends only logarithmically on the skewed mixing weights. In particular, it succeeds for a separation be- tween the centers that is �(σ √ T log�), where σ is the maximum directional standard devia- tion of any distribution in the mixture, T is the number of distributions, andis polynomial in T , σ, log n and the imbalance in the mixing weights. For our algorithm to succeed, we re- quire a spreading condition, that the distance between the centers be spread across�(T log�) coordinates. Additionally, with arbitrarily small separation, i.e., even when the separation is not enough for clustering, with enough sam- ples, we can approximate the subspace con- taining the centers. Previous techniques failed to do so in polynomial time for non-spherical distributions regardless of the number of sam- ples, unless the separation was large with re- spect to the maximum directional variance σ and polynomially large with respect to the im- balance of mixing weights.Our algorithm works for Binary Product Distributionsand Axis-Aligned Gaussians. The spreading condition above is implied by the separation condition for binary product distributions, and is necessary for al- gorithms that rely on linear correlations.
更多
查看译文
关键词
polynomial time,col,product distribution
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要