Query-Efficient Black-Box Adversarial Attacks on Automatic Speech Recognition.

IEEE ACM Trans. Audio Speech Lang. Process.(2023)

引用 0|浏览9
暂无评分
摘要
The susceptibility of Deep Neural Networks (DNNs) to adversarial attacks has raised concerns regarding their practical applications in real-world scenarios. Although the vulnerability of DNNs to adversarial attacks has been extensively studied in the image domain, research in the audio domain, particularly in the black-box setting with Automatic Speech Recognition (ASR) models, remains limited. While various black-box attacks have been proposed for ASR models, such as transfer attacks, hardware attacks, and query-based attacks, this study concentrates on query-based black-box attacks. The article introduces a new gradient estimation technique, Temporal Natural Evolution Strategies (T-NES), to generate adversarial audio samples more efficiently than existing attacks. T-NES leverages the temporal correlation present in audio to speed up gradient estimation based on the probability scores returned by the target model. The empirical results on benchmark datasets, LibriSpeech and TEDLIUM, and two state-of-the-art ASR models, DeepSpeech2 and Wav2Letter, demonstrate that T-NES can generate successful attacks with up to 30% fewer queries than existing attacks within 500 queries. T-NES could provide a robust baseline for evaluating the black-box adversarial vulnerability of ASR systems.
更多
查看译文
关键词
Closed box, Estimation, Computational modeling, Hardware, Glass box, Acoustics, Feature extraction, Adversarial attack, automatic speech recognition, adversarial robustness
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要