A proposal for supervised clustering with Dirichlet Process using labels.

Pattern Recognition Letters(2016)

引用 7|浏览32
暂无评分
摘要
We propose to modify the prior process of clusters by considering label information.We heuristically modify the prior process of clusters considering a Polya Urn model.We test five real datasets with 30 random-holdout obtaining good performance in comparison to other alternatives.We analyse five real datasets respect to MCMC learning process considering log-likelihood and number of clusters.We recommend to this variant because it has better performance that clustering based on Dirichlet Process. Supervised clustering is an emerging area of machine learning, where the goal is to find class-uniform clusters. However, typical state-of-the-art algorithms use a fixed number of clusters. In this work, we propose a variation of a non-parametric Bayesian modeling for supervised clustering. Our approach consists of modeling the clusters as a mixture of Gaussians with the constraint of encouraging clusters of points with the same label. In order to estimate the number of clusters, we assume a-priori a countably infinite number of clusters using a variation of Dirichlet Process model over the prior distribution. In our experiments, we show that our technique typically outperforms the results of other clustering techniques.
更多
查看译文
关键词
Dirichlet Process,Supervised clustering,Clustering
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要