Information theoretic clustering of sparse cooccurrence data

international conference on data mining（2003）

引用 92|浏览56

暂无评分

摘要

A novel approach to clustering co-occurrence data posesit as an optimization problem in information theory whichminimizes the resulting loss in mutual information. A divisiveclustering algorithm that monotonically reduces thisloss function was recently proposed. In this paper we showthat sparse high-dimensional data presents special challengeswhich can result in the algorithm getting stuck atpoor local minima. We propose two solutions to this problem:(a) a "prior" to overcome infinite relative entropy valuesas in the supervised Naive Bayes algorithm, and (b)local search to escape local minima. Finally, we combinethese solutions to get a robust algorithm that is computationallyefficient. We present experimental results to showthat the proposed method is effective in clustering documentcollections and outperforms previous information-theoreticclustering approaches.

查看译文

关键词

bayes methods,information theory,learning (artificial intelligence),optimisation,pattern clustering,divisive clustering algorithm,document clustering,local minima,sparse high-dimensional cooccurrence data,supervised naive bayes algorithm,loss function,high dimensional data,learning artificial intelligence,relative entropy,local search,probability distribution,naive bayes,optimization problem,mutual information,random variable

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要