Open-Domain Semi-Supervised Learning via Glocal Cluster Structure Exploitation

IEEE Transactions on Knowledge and Data Engineering(2024)

引用 0|浏览6
暂无评分
摘要
Semi-supervised learning (SSL) aims to reduce the heavy reliance of current deep models on costly manual annotation by leveraging a large amount of unlabeled data in combination with a much smaller set of labeled data. However, most existing SSL methods assume that all labeled and unlabeled data are drawn from the same feature distribution, which can be impractical in real-world applications. In this study, we take the initial step to systematically investigate the open-domain semi-supervised learning setting, where a feature distribution mismatch exists between labeled and unlabeled data. In pursuit of an effective solution for open-domain SSL, we propose a novel framework called GlocalMatch , which aims to exploit both glo bal and lo cal ( i.e. , glocal) cluster structure of open-domain unlabeled data. The glocal cluster structure is utilized in two complementary ways. Firstly, GlocalMatch optimizes a Glocal Cluster Compacting (GCC) objective, that encourages feature representations of the same class, whether with in the same domain or across different domains, to become closer to each other. Secondly, GlocalMatch incorporates a Glocal Semantic Aggregation (GSA) strategy to produce more reliable pseudo-labels by aggregating predictions from neighboring clusters. Extensive experiments demonstrate that GlocalMatch outperforms the state-of-the-art SSL methods significantly, achieving superior performance for both in-domain and out-of-domain generalization. The code is released in https://github.com/nukezil/GlocalMatch .
更多
查看译文
关键词
Semi-supervised learning,distribution mismatch,cluster structure,pseudo-labeling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要