Compact Time-Domain Representation for Logical Access Spoofed Audio

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING(2024)

引用 0|浏览0
暂无评分
摘要
Anti-spoofing is the task of speech authentication. That is, identifying genuine human speech compared to spoofed speech. The main focus of this paper is to suggest new representations for genuine and spoofed speech, based on the probability mass function (PMF) estimation of the audio waveforms' amplitude. We introduce a new feature extraction method for speech audio signals: unlike traditional methods, our method is based on direct processing of time-domain audio samples. The PMF is utilized by designing a feature extractor based on different PMF distances and similarity measures. As an additional step, we used filterbank preprocessing, which significantly affects the discriminative characteristics of the features and facilitates convenient visualization of possible clustering of spoofing attacks. Furthermore, we use diffusion maps to reveal the underlying manifold on which the data lies. The suggested embeddings allow the use of simple linear separators to achieve 12.99% Equal Error Rate (EER) on ASVspoof2019 logical Access (LA) test set for female samples, and 12.09% for male samples. In addition, we present a convenient way to visualize the data, which helps to assess the efficiency of different spoofing techniques. Furthermore, we present reduced complexity embedding method by using compander quantization, which in some cases even improves the EER on the test set up to 3.00%. The experimental results show the potential of using multichannel PMF-based features for the anti-spoofing task, in addition to the benefits of using diffusion maps both as an analysis tool and as an embedding tool.
更多
查看译文
关键词
Anti-spoofing,compander,diffusion maps,speech embedding,speech probability mass function
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要