Labels, Information, and Computation: Efficient Learning Using Sufficient Labels

Shiyu Duan,Spencer Chang,Jose C. Principe

JOURNAL OF MACHINE LEARNING RESEARCH（2023）

引用 0|浏览2

暂无评分

摘要

In supervised learning, obtaining a large set of fully-lab eled training data is expensive. We show that we do not always need full label information on every single training example to train a competent classifier. Specifically, inspired by the principle of sufficiency in statistics, we present a statistic (a summary) of the fully-lab eled training set that captures almost all the relevant information for classification but at the same time is easier to obtain directly. We call this statistic "sufficiently-lab eled data" and prove its sufficiency and efficiency for finding the optimal hidden representations, on which competent classifier heads can be trained using as few as a single randomly-chosen fully-lab eled example per class. Sufficiently-lab eled data can be obtained from annotators directly without collecting the fully-lab eled data first. And we prove that it is easier to directly obtain sufficiently-lab eled data than obtaining fully-lab eled data. Furthermore, sufficiently-lab eled data is naturally more secure since it stores relative, instead of absolute, information. Extensive experimental results are provided to support our theory.

查看译文

关键词

Classification,Deep Learning,Data Efficiency,Data Security,Privacy-Preserving Learning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要