A Cluster-then-label Approach for Few-shot Learning with Application to Automatic Image Data Labeling.

Journal of Data and Information Quality(2022)

引用 4|浏览12
暂无评分
摘要
Few-shot learning (FSL) aims at learning to generalize from only a small number of labeled examples for a given target task. Most current state-of-the-art FSL methods typically have two limitations. First, they usually require access to a source dataset (in a similar domain) with abundant labeled examples, which may not always be possible due to privacy concerns and copyright issues. Second, they typically do not offer any estimation of the generalization error on the target FSL task, because the handful of labeled examples must be used for training and cannot spare a validation subset. In this article, we propose a cluster-then-label approach to perform few-shot learning. Our approach does not require access to the labeled source dataset and provides an estimation of generalization error. We show empirically, on four benchmark datasets, that our approach provides competitive predictive performance to state-of-the-art FSL approaches and our generalization error estimation is accurate. Finally, we explore the application of our proposed method to automatic image data labeling. We compare our method with existing automatic data labeling systems. The end-to-end performance of our method outperforms the state-of-the-art automatic data labeling system Snuba by 26% and is only 7% away from the fully supervised upper bound.
更多
查看译文
关键词
Few-shot learning, cluster-then-label, data labeling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要