Enhancing Synthesized Speech Detection with Dual Attention Using Features Fusion

Bo Wang,Yanyan Ma, Yeling Tang,Rui Wang, Maozhen Zhang

2023 International Conference on Computer Applications Technology (CCAT)(2023)

引用 0|浏览0
暂无评分
摘要
Automatic Speaker Verification (ASV) is a system based on speech recognition used for identity verification. However, with the continuous improvement of synthetic speech generation technology, the quality of generated speech is also getting higher and higher. This allows some malicious actors to use synthetic speech to deceive, which poses a serious threat to ASV. Therefore, we need to continuously improve the authenticity identification to cope with this challenge. Although many synthetic speech detection algorithms have been proposed, their generalization performance is still not ideal. This means that the performance of ASV systems may suffer in the face of previously unseen synthetic speech attacks. Therefore, researchers need to further explore new methods for detecting synthetic speech attacks and strengthen the robustness of ASV systems against synthetic speech attacks. In order to extract more abundant and reliable speech features, this paper proposes a dual-attention network, specifically combining the features extracted by Wav2vec with traditional speech features Logmel through self-attention, and inputting them into a Resnet network with convolutional block attention module (CBAM). The results have shown that our method has achieved a very competitive performance on the Asvspoof 2021 LA and DF datasets, with a t-DCF of 0.3008 on LA and an EER of 3.9% on DF, indicating good generalization performance.
更多
查看译文
关键词
Wav2vec,Logmel,synthetic speech,resnet,CBAM
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要