Discriminative Adversarial Unlearning
CoRR(2024)
摘要
We introduce a novel machine unlearning framework founded upon the
established principles of the min-max optimization paradigm. We capitalize on
the capabilities of strong Membership Inference Attacks (MIA) to facilitate the
unlearning of specific samples from a trained model. We consider the scenario
of two networks, the attacker 𝐀 and the trained defender
𝐃 pitted against each other in an adversarial objective, wherein the
attacker aims at teasing out the information of the data to be unlearned in
order to infer membership, and the defender unlearns to defend the network
against the attack, whilst preserving its general performance. The algorithm
can be trained end-to-end using backpropagation, following the well known
iterative min-max approach in updating the attacker and the defender. We
additionally incorporate a self-supervised objective effectively addressing the
feature space discrepancies between the forget set and the validation set,
enhancing unlearning performance. Our proposed algorithm closely approximates
the ideal benchmark of retraining from scratch for both random sample
forgetting and class-wise forgetting schemes on standard machine-unlearning
datasets. Specifically, on the class unlearning scheme, the method demonstrates
near-optimal performance and comprehensively overcomes known methods over the
random sample forgetting scheme across all metrics and multiple network pruning
strategies.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要