Robustness with Query-efficient Adversarial Attack using Reinforcement Learning.

CVPR Workshops(2023)

引用 4|浏览12
暂无评分
摘要
A measure of robustness against naturally occurring distortions is key to the safety, success, and trustworthiness of machine learning models on deployment. We propose an adversarial black-box attack that adds minimum Gaussian noise distortions to input images to make machine learning models misclassify. We used a Reinforcement Learning (RL) agent as a smart hacker to explore the input images to add minimum distortions to the most sensitive regions to induce misclassification. The agent employs a smart policy also to remove noises introduced earlier, which has less impact on the trained model at a given state. This novel approach is equivalent to doing a deep tree search to add noises without an exhaustive search, leading to faster and optimal convergence. Also, this adversarial attack method effectively measures the robustness of image classification models with the misclassification inducing minimum L 2 distortion of Gaussian noise similar to many naturally occurring distortions. Furthermore, the proposed black-box L 2 adversarial attack tool beats state-of-the-art competitors in terms of the average number of queries by a significant margin with a 100% success rate while maintaining a very competitive L 2 score, despite limiting distortions to Gaussian noise. For the ImageNet dataset, the average number of queries achieved by the proposed method for ResNet-50, Inception-V3, and VGG-16 models are 42%, 32%, and 31% better than the state-of-the-art "Square-Attack" approach while maintaining a competitive L 2 .Demo: https://tinyurl.com/yr8f7x9t
更多
查看译文
关键词
100% success rate,adversarial attack method,black-box attack,black-box L2adversarial attack tool beats state-of-the-art competitors,deep tree search,exhaustive search,image classification models,input images,machine learning models,minimum distortions,minimum Gaussian noise distortions,misclassification inducing minimum L2distortion,naturally occurring distortions,query-efficient adversarial,reinforcement learning,smart hacker,smart policy,Square-Attack,trained model,VGG-16 models
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要