Words Can Be Confusing: Stereotype Bias Removal in Text Classification at the Word Level.

Shaofei Shen,Mingzhe Zhang,Weitong Chen, Alina Bialkowski,Miao Xu

PAKDD (4)(2023)

引用 0|浏览21
暂无评分
摘要
Text classification is a widely used task in natural language processing. However, the presence of stereotype bias in text classification can lead to unfair and inaccurate predictions. Stereotype bias is particularly prevalent in words that are unevenly distributed across classes and are associated with specific categories. This bias can be further strengthened in pre-trained models on large natural language datasets. Prior works to remove stereotype bias have mainly focused on specific demographic groups or relied on specific thesauri without measuring the influence of stereotype words on predictions. In this work, we present a causal analysis of how stereotype bias occurs and affects text classification, and propose a framework to mitigate stereotype bias. Our framework detects potential stereotype bias words using SHAP values and alleviates bias in the prediction stage through a counterfactual approach. Unlike existing debiasing methods, our framework does not rely on existing stereotype word sets and can dynamically evaluate the influence of words on stereotype bias. Extensive experiments and ablation studies show that our approach effectively improves classification performance while mitigating stereotype bias.
更多
查看译文
关键词
stereotype bias removal,text classification,words level
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要