Diffusion in computer science and statistics

Diffusion in computer science and statistics(2009)

引用 24|浏览50
暂无评分
摘要
In this thesis, we investigate diffusion as an algorithmic and analytic tool in statistics and computer science. We address a question arising from computational linguistics, where we wish to understand the behavior of a network of agents modeled as nodes of a graph that adaptively modify their lexicon using data from their neighbors. By introducing a model of memory and a family of coalescing random walks, we prove that they eventually reach a consensus with probability 1. We study distributed averaging on graphs and devise a distributed algorithm that is based on a diffusion process having two time scales. Addressing the question of routing in a network, we use steady-state diffusions corresponding to electrical flow in a network of resistors for oblivious routing and prove that this scheme performs well under a variety of performance measures. Based on a microscopic view of diffusion as an ensemble of particles executing independent Brownian motions, we develop the fastest currently known algorithm for computing the area of the boundary of a convex set. A similar technique is used to produce samplers for the boundaries of convex sets and smooth hypersurfaces that are the boundaries of open sets in Rn, assuming access to samplers for the interior. These algorithms are motivated by Goodness-of-Fit tests in statistics. The halfplane capacity, a quantity often used to parameterize stochastic processes arising in statistical physics, known as Schramm-Loewner evolutions, is shown to be comparable to a more geometric notion. We analyze a class of natural random walks on a Riemannian manifold, and give bounds on the mixing times in terms of the Cheeger constant and a notion of smoothness that relates the random walk to the metric underlying the manifold. A Markov chain having a stationary distribution that is uniform on the interior of a polytope is developed. This is the first chain whose mixing time is strongly polynomial when initiated in the vicinity of the center of mass. This Markov chain can be interpreted as a random walk on a certain Riemannian manifold. The resulting algorithm for sampling polytopes outperforms known algorithms when the number of constraints is of the same order of magnitude as the dimension. We use a variant of this Markov chain to design a randomized version of Dikin's affine scaling algorithm for linear programming. We provide polynomial-time guarantees which do not exist for Dikin's algorithm. Addressing a question from machine learning, under certain smoothness conditions, we prove that a form of weighted surface area is the limit of the weight of graph cuts in a family of random graphs arising in the context of clustering. This is done by relating both to the amount of diffusion across the surface in question. Addressing a related issue on manifolds, we obtain an upper bound on the annealed entropy of the collection of open subsets of a manifold whose boundaries are well-conditioned. This result leads to an upper bound on the number of random samples needed before it is possible to accurately classify data lying on a manifold.
更多
查看译文
关键词
Riemannian manifold,random sample,random graph,affine scaling algorithm,random walk,certain Riemannian manifold,resulting algorithm,natural random walk,convex set,computer science,Markov chain
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要