Data clustering using hidden variables in hybrid Bayesian networks
Progress in Artificial Intelligence(2014)
摘要
In this paper, we analyze the problem of data clustering in domains where discrete and continuous variables coexist. We propose the use of hybrid Bayesian networks with naïve Bayes structure and hidden class variable. The model integrates discrete and continuous features, by representing the conditional distributions as mixtures of truncated exponentials (MTEs). The number of classes is determined through an iterative procedure based on a variation of the data augmentation algorithm. The new model is compared with an EM-based clustering algorithm where each class model is a product of conditionally independent probability distributions and the number of clusters is decided by using a cross-validation scheme. Experiments carried out over real-world and synthetic data sets show that the proposal is competitive with state-of-the-art methods. Even though the methodology introduced in this manuscript is based on the use of MTEs, it can be easily instantiated to other similar models, like the Mixtures of Polynomials or the mixtures of truncated basis functions in general.
更多查看译文
关键词
Probabilistic clustering,Mixtures of truncated exponentials,Unsupervised classification,Hybrid Bayesian networks
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络