MiTTenS: A Dataset for Evaluating Misgendering in Translation
CoRR(2024)
摘要
Misgendering is the act of referring to someone in a way that does not
reflect their gender identity. Translation systems, including foundation models
capable of translation, can produce errors that result in misgendering harms.
To measure the extent of such potential harms when translating into and out of
English, we introduce a dataset, MiTTenS, covering 26 languages from a variety
of language families and scripts, including several traditionally
underpresented in digital resources. The dataset is constructed with
handcrafted passages that target known failure patterns, longer synthetically
generated passages, and natural passages sourced from multiple domains. We
demonstrate the usefulness of the dataset by evaluating both dedicated neural
machine translation systems and foundation models, and show that all systems
exhibit errors resulting in misgendering harms, even in high resource
languages.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要