Generate pairwise constraints from unlabeled data for semi-supervised clustering

Abdul Masud, Joshua Zhexue Huang,Ming Zhong,Xianghua Fu

Data & Knowledge Engineering(2019)

引用 12|浏览70
暂无评分
摘要
Pairwise constraint selection methods often rely on the label information of data to generate pairwise constraints. This paper proposes a new method of selecting pairwise constraints from unlabeled data for semi-supervised clustering to improve clustering accuracy. Given a dataset without any label information, it is first clustered by using the I-nice method into a set of initial clusters. From each initial cluster, a dense group of objects is obtained by removing the faraway objects. Then, the most informative object and the informative objects are identified with the local density estimation method in each dense group of objects. The identified objects are used to form a set of pairwise constraints, which are incorporated in the semi-supervised clustering algorithm to guide the clustering process toward a better solution. The advantage of this method is that no label information of data is required for selection pairwise constraints. Experimental results demonstrate that the new method improved the clustering accuracy and outperformed four state-of-the-art pairwise constraint selection methods, namely, random, FFQS, min–max, and NPU, on both synthetic and real-world datasets.
更多
查看译文
关键词
Constrained clustering,I-nice approach,Pairwise constraints selection,Semi-supervised clustering
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要