Distill, Adapt, Distill: Training Small, In-Domain Models for Neural Machine Translation

NEURAL GENERATION AND TRANSLATION(2020)

引用 9|浏览63
暂无评分
摘要
We explore best practices for training small, memory efficient machine translation models with sequence-level knowledge distillation in the domain adaptation setting. While both domain adaptation and knowledge distillation are widely-used, their interaction remains little understood. Our large-scale empirical results in machine translation (on three language pairs with three domains each) suggest distilling twice for best performance: once using general-domain data and again using in-domain data with an adapted teacher. The code for these experiments can be found here.(1)
更多
查看译文
关键词
distill,translation,neural
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要