On Bias and Fairness in NLP: Investigating the Impact of Bias and Debiasing in Language Models on the Fairness of Toxicity Detection
arxiv(2023)
摘要
Language models are the new state-of-the-art natural language processing
(NLP) models and they are being increasingly used in many NLP tasks. Even
though there is evidence that language models are biased, the impact of that
bias on the fairness of downstream NLP tasks is still understudied.
Furthermore, despite that numerous debiasing methods have been proposed in the
literature, the impact of bias removal methods on the fairness of NLP tasks is
also understudied. In this work, we investigate three different sources of bias
in NLP models, i.e. representation bias, selection bias and overamplification
bias, and examine how they impact the fairness of the downstream task of
toxicity detection. Moreover, we investigate the impact of removing these
biases using different bias removal techniques on the fairness of toxicity
detection. Results show strong evidence that downstream sources of bias,
especially overamplification bias, are the most impactful types of bias on the
fairness of the task of toxicity detection. We also found strong evidence that
removing overamplification bias by fine-tuning the language models on a dataset
with balanced contextual representations and ratios of positive examples
between different identity groups can improve the fairness of the task of
toxicity detection. Finally, we build on our findings and introduce a list of
guidelines to ensure the fairness of the task of toxicity detection.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要