One-Class Neural Network With Directed Statistics Pooling for Spoofing Speech Detection

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY(2024)

引用 0|浏览3
暂无评分
摘要
Existing deep learning models for spoofing speech detection often struggle to effectively generalize to unseen spoofing attacks that were not present during the training stage. Moreover, the presence of class imbalance further compounds this issue by biasing the learning process towards seen attack samples. To address these challenges, we present an innovative end-to-end model called One-Class Neural Network with Directed Statistics Pooling (OCNet-DSP). Our model incorporates a feature cropping operation to attenuate high-frequency components, mitigating the risk of overfitting. Additionally, leveraging the time-frequency characteristics of speech signals, we introduce a directed statistics pooling layer that extracts more effective features for distinguishing between bonafide and spoofing classes. We also propose the Threshold One-class Softmax loss, which mitigates class imbalance by reducing the optimization weight of spoofing samples during training. Extensive comparative results demonstrate that the proposed model outperforms all existing single models, achieving an equal error rate of 0.44% and a minimum detection cost function of 0.0145 for the ASVspoof 2019 logical access database. Moreover, the proposed ensemble version, which accommodates speech inputs of varying lengths in each submodel, maintains state-of-the-art performance among reproducible ensemble models. Additionally, numerous ablation experiments, along with a cross-dataset experiment, are conducted to validate the rationality and effectiveness of the proposed model.
更多
查看译文
关键词
Feature extraction,Spectrogram,Training,Filter banks,Voice activity detection,Convolution,Time-domain analysis,Spoofing speech detection,ASVspoof 2019,one-class,directed statistics pooling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要