BFS2Adv: Black-Box Adversarial Attack Towards Hard-to-Attack Short Texts

Computers & Security(2024)

引用 0|浏览1
暂无评分
摘要
The advent of Machine Learning as a Service (MLaaS) and deep learning applications has increased the susceptiblility of models to adversarial textual attacks, particularly in black-box settings. Prior work on black-box adversarial textual attacks generally follows a stable strategy that involves leveraging char-level, world-level, and sentence-level perturbations, as well as using queries to the target model to find adversarial examples in the search space. However, existing approaches prioritize query efficiency by reducing the search space, thereby overlooking hard-to-attack textual instances. To address this issue, we propose BFS2Adv, a brute force algorithm that generates adversarial examples for both easy-to-attack and hard-to-attack textual inputs. BFS2Adv, starting with an original text, employs word-level perturbations and synonym substitution to construct a comprehensive search space, with each node representing a potential adversarial example. The algorithm systematically explores this space through a breadth-first search, combined with queries to the target model, to effectively identify qualified adversarial examples. We implemented and evaluated a prototype of BFS2Adv against renowned models such as ALBERT and BERT, utilizing the SNLI and MR datasets. Our results demonstrate that BFS2Adv outperforms state-of-the-art algorithms and effectively improves the success rate of short-text adversarial attacks. Furthermore, we provide detailed insights into the robustness of BFS2Adv by analyzing those hard-to-attack examples.
更多
查看译文
关键词
Text Classification,Adversarial attack,Score-based adversarial attack,Hard-to-attack examples
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要