In Search of Smooth Minima for Purifying Backdoor in Deep Neural Networks

ICLR 2023(2023)

引用 0|浏览20
The success of a deep neural network (DNN) heavily relies on the details of the training scheme; e.g., training data, architectures, hyper-parameters, etc. Recent backdoor attacks suggest that an adversary can take advantage of such training details and compromise the integrity of DNN. Our studies show that a backdoor model is usually optimized to a bad local minima, i.e., sharper minima as compared to a benign model. Intuitively, backdoor can be purified by re-optimizing the model to a smoother minima through fine-tuning with a few clean validation data. However, fine-tuning all DNN parameters often requires huge computational costs as well as sub-par clean test performance. To address this concern, we propose a novel backdoor purification technique—N atural G radient Fine-tuning (NGF)—which focuses on removing backdoor by fine-tuning only one layer. Specifically, NGF utilizes a loss surface geometry-aware optimizer that can successfully overcome the challenge of reaching a smooth minima under one-layer optimization scenario. To enhance the generalization performance of our proposed method, we introduce a clean data distribution-aware regularizer based on the knowledge of loss surface curvature matrix, i.e., Fisher Information Matrix. To validate the effectiveness of our method, we conduct extensive experimentation with four different datasets— CIFAR10, GTSRB, Tiny-ImageNet, and ImageNet; as well as 11 recent backdoor attacks, e.g., Blend, Dynamic, Clean Label, etc. NGF achieves state-of-the-art performance in most of these benchmarks.
AI Security,Backdoor or Trojan Attacks on Deep Networks,Safe and Robust AI
AI 理解论文
Chat Paper