Clustering algorithms for random and pseudo-random structures

Clustering algorithms for random and pseudo-random structures(2008)

引用 23|浏览14
暂无评分
摘要
Partitioning of a set objects into a number of clusters according to a suitable distance metric is one of the major questions in data mining, information retrieval and other fields of computer science. This thesis describes and analyzes algorithms for clustering and partitioning data generated from random and pseudo-random models. In random models, one assumes that the data matrix to be partitioned is generated from a simple distribution. A common feature of all the algorithms analyzed is that they are all spectral algorithms, as they employ information about the spectrum of the data matrix to perform clustering. We provide new results in a number of directions. A method based on the second singular vector is analyzed for a mixture model for graphs. The use of the notion of pseudo-randomness is another important aspect of our work. Pseudo-randomness, the idea to use deterministic definitions to capture properties of randomness, is used to extend the notion of probabilistic models, thus making it possible to model clustering problems for sparse (constant-degree) graphs. This work also proposes the first geometric, projection based algorithms known for discrete distributions, which allows a number of generalizations. In addition, entry-wise bounds for eigenvectors of adjacency matrices of random graphs are studied, and their implications for spectral algorithms are considered.
更多
查看译文
关键词
random graph,data mining,Clustering algorithm,information retrieval,random model,partitioning data,pseudo-random structure,mixture model,spectral algorithm,adjacency matrix,clustering problem,data matrix
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要