Key ingredients for effective zero-shot cross-lingual knowledge transfer in generative tasks
CoRR(2024)
Abstract
Zero-shot cross-lingual generation implies finetuning of the multilingual
pretrained language model on a generation task in one language and then using
it to make predictions for this task in other languages. Previous works notice
a frequent problem of generation in a wrong language and propose approaches to
address it, usually using mT5 as a backbone model. In this work we compare
various approaches proposed from the literature in unified settings, also
including alternative backbone models, namely mBART and NLLB-200. We first
underline the importance of tuning learning rate used for finetuning, which
helps to substantially alleviate the problem of generation in the wrong
language. Then, we show that with careful learning rate tuning, the simple full
finetuning of the model acts as a very strong baseline and alternative
approaches bring only marginal improvements. Finally, we find that mBART
performs similarly to mT5 of the same size, and NLLB-200 can be competitive in
some cases. Our final models reach the performance of the approach based on
data translation which is usually considered as an upper baseline for zero-shot
cross-lingual generation.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined