The Double-Edged Sword of Input Perturbations to Robust Accurate Fairness
arxiv(2024)
摘要
Deep neural networks (DNNs) are known to be sensitive to adversarial input
perturbations, leading to a reduction in either prediction accuracy or
individual fairness. To jointly characterize the susceptibility of prediction
accuracy and individual fairness to adversarial perturbations, we introduce a
novel robustness definition termed robust accurate fairness. Informally, robust
accurate fairness requires that predictions for an instance and its similar
counterparts consistently align with the ground truth when subjected to input
perturbations. We propose an adversarial attack approach dubbed RAFair to
expose false or biased adversarial defects in DNN, which either deceive
accuracy or compromise individual fairness. Then, we show that such adversarial
instances can be effectively addressed by carefully designed benign
perturbations, correcting their predictions to be accurate and fair. Our work
explores the double-edged sword of input perturbations to robust accurate
fairness in DNN and the potential of using benign perturbations to correct
adversarial instances.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要