mEdIT: Multilingual Text Editing via Instruction Tuning
CoRR(2024)
摘要
We introduce mEdIT, a multi-lingual extension to CoEdIT – the recent
state-of-the-art text editing models for writing assistance. mEdIT models are
trained by fine-tuning multi-lingual large, pre-trained language models (LLMs)
via instruction tuning. They are designed to take instructions from the user
specifying the attributes of the desired text in the form of natural language
instructions, such as Grammatik korrigieren (German) or Parafrasee la oración
(Spanish). We build mEdIT by curating data from multiple publicly available
human-annotated text editing datasets for three text editing tasks (Grammatical
Error Correction (GEC), Text Simplification, and Paraphrasing) across diverse
languages belonging to six different language families. We detail the design
and training of mEdIT models and demonstrate their strong performance on many
multi-lingual text editing benchmarks against other multilingual LLMs. We also
find that mEdIT generalizes effectively to new languages over multilingual
baselines. We publicly release our data, code, and trained models at
https://github.com/vipulraheja/medit.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要