From One to Many: Expanding the Scope of Toxicity Mitigation in Language Models
arxiv(2024)
摘要
To date, toxicity mitigation in language models has almost entirely been
focused on single-language settings. As language models embrace multilingual
capabilities, it's crucial our safety measures keep pace. Recognizing this
research gap, our approach expands the scope of conventional toxicity
mitigation to address the complexities presented by multiple languages. In the
absence of sufficient annotated datasets across languages, we employ translated
data to evaluate and enhance our mitigation techniques. We also compare
finetuning mitigation approaches against retrieval-augmented techniques under
both static and continual toxicity mitigation scenarios. This allows us to
examine the effects of translation quality and the cross-lingual transfer on
toxicity mitigation. We also explore how model size and data quantity affect
the success of these mitigation efforts. Covering nine languages, our study
represents a broad array of linguistic families and levels of resource
availability, ranging from high to mid-resource languages. Through
comprehensive experiments, we provide insights into the complexities of
multilingual toxicity mitigation, offering valuable insights and paving the way
for future research in this increasingly important field. Code and data are
available at https://github.com/for-ai/goodtriever.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要