Confident Sinkhorn Allocation for Pseudo-Labeling

ICLR 2023(2022)

引用 0|浏览11
暂无评分
摘要
Semi-supervised learning is a critical tool in reducing machine learning's dependence on labeled data. It has been successfully applied to structure data, such as image and language data, by exploiting the inherent spatial and semantic structure therein with pretrained models or data augmentation. Some of these methods are no longer applicable for the data where domain structures are not available because the pretrained models or data augmentation can not be used. Due to simplicity, existing pseudo-labeling (PL) methods can be widely used without any domain assumption, but are vulnerable to noise samples and to greedy assignments given a predefined threshold which is typically unknown. This paper addresses this problem by proposing a Confident Sinkhorn Allocation (CSA), which assigns labels to only samples with high confidence scores and learns the best label allocation via optimal transport. CSA outperforms the current state-of-the-art in this practically important area of semi-supervised learning. Our code is publicly available at https://github.com/amzn/confident-sinkhorn-allocation .
更多
查看译文
关键词
pseudo-labeling,semi-supervised learning,tabular data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要