Projective Methods for Mitigating Gender Bias in Pre-trained Language Models
CoRR(2024)
摘要
Mitigation of gender bias in NLP has a long history tied to debiasing static
word embeddings. More recently, attention has shifted to debiasing pre-trained
language models. We study to what extent the simplest projective debiasing
methods, developed for word embeddings, can help when applied to BERT's
internal representations. Projective methods are fast to implement, use a small
number of saved parameters, and make no updates to the existing model
parameters. We evaluate the efficacy of the methods in reducing both intrinsic
bias, as measured by BERT's next sentence prediction task, and in mitigating
observed bias in a downstream setting when fine-tuned. To this end, we also
provide a critical analysis of a popular gender-bias assessment test for
quantifying intrinsic bias, resulting in an enhanced test set and new bias
measures. We find that projective methods can be effective at both intrinsic
bias and downstream bias mitigation, but that the two outcomes are not
necessarily correlated. This finding serves as a warning that intrinsic bias
test sets, based either on language modeling tasks or next sentence prediction,
should not be the only benchmark in developing a debiased language model.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要