Inference-Time Rule Eraser: Distilling and Removing Bias Rules to Mitigate Bias in Deployed Models
arxiv(2024)
摘要
Fairness is critical for artificial intelligence systems, especially for
those deployed in high-stakes applications such as hiring and justice. Existing
efforts toward fairness in machine learning fairness require retraining or
fine-tuning the neural network weights to meet the fairness criteria. However,
this is often not feasible in practice for regular model users due to the
inability to access and modify model weights. In this paper, we propose a more
flexible fairness paradigm, Inference-Time Rule Eraser, or simply Eraser, which
considers the case where model weights can not be accessed and tackles fairness
issues from the perspective of biased rules removal at inference-time. We first
verified the feasibility of modifying the model output to wipe the biased rule
through Bayesian analysis, and deduced Inference-Time Rule Eraser via
subtracting the logarithmic value associated with unfair rules (i.e., the
model's response to biased features) from the model's logits output as a means
of removing biased rules. Moreover, we present a specific implementation of
Rule Eraser that involves two stages: (1) limited queries are performed on the
model with inaccessible weights to distill its biased rules into an additional
patched model, and (2) during inference time, the biased rules already
distilled into the patched model are excluded from the output of the original
model, guided by the removal strategy outlined in Rule Eraser. Exhaustive
experimental evaluation demonstrates the effectiveness and superior performance
of the proposed Rule Eraser in addressing fairness concerns.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要