A Comparative Study of Gender Bias Mitigation Techniques and the Role of Implicit Stereotypes

Nasim Sobhani,Sarah Jane Delany

Research Square (Research Square)(2023)

引用 0|浏览0
Abstract Natural language models and systems have been shown to reflect gender bias existing in training data. This bias can impact on the downstream task that machine learning models, built on this training data, are to accomplish. A variety of techniques have been proposed to mitigate gender bias in training data. In this paper we compare different gender bias mitigation approaches on a classification task. We consider mitigation techniques that manipulate the training data itself, including data scrubbing, gender-swapping, and counterfactual data augmentation approaches. We consider how gender neutralization , which replaces gender-specific terms with their gender-neutral equivalents, performs as a bias mitigation technique. We evaluate the effectiveness of the different approaches at reducing gender bias in the training data and consider the impact on task performance. We also look at the impact that de-biased word embeddings in the representation of the training data have on gender bias mitigation. Our results show that the performance of the classification task is not affected adversely by many of them but we show a significant variation in the effectiveness of the different gender bias mitigation techniques. These gender bias mitigation approaches typically focus on lexical or explicit gender in text and while these approaches reduce gender bias in training data, they do not remove it completely. We investigate the source of gender bias in the training data through the use of stereotypical masculine and feminine words that can signal gender implicitly due to gender roles or behaviour. We show how certain words which have an associated implicit gender may contribute to gender bias, even after attempts to neutralize gender in language.
implicit stereotypes,gender bias mitigation techniques
AI 理解论文
Chat Paper