Multistage Collaborative Knowledge Distillation from a Large Language Model for Semi-Supervised Sequence Generation
CoRR(2023)
摘要
We study semi-supervised sequence generation tasks, where the few labeled
examples are too scarce to finetune a model, and meanwhile, few-shot prompted
large language models (LLMs) exhibit room for improvement. In this paper, we
present the discovery that a student model distilled from a few-shot prompted
LLM can commonly generalize better than its teacher to unseen examples on such
tasks. We find that the student is able to learn a general pattern from the
high-quality pseudolabels produced by the teacher during knowledge distillation
(KD), and favorably not a general pattern from the low-quality pseudolables.
Leveraging this discovery, we propose a new method, Multistage Collaborative
Knowledge Distillation from an LLM (MCKD), for these tasks. MCKD first few-shot
prompts an LLM to produce pseudolabels for unlabeled data. Then at each stage
of an iterative KD process, a new pair of students is trained on disjoint
partitions of the pseudolabeled data, and produces new and improved
pseudolabels for their unseen partitions. We conduct extensive experiments on
four syntactic and semantic parsing datasets and show the effectiveness of MCKD
for low-resource semi-supervised sequence generation. On CRAFT biomedical
parsing, for example, 3-stage MCKD with 50 labeled examples outperforms an LLM
teacher and vanilla KD by 7.5
the performance of supervised finetuning with 500 labeled examples.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要