Deep correlation network for synthetic speech detection

Chen Chen, Bohan Dai, Bochao Bai,Deyun Chen

APPLIED SOFT COMPUTING(2024)

引用 0|浏览0
暂无评分
摘要
Synthetic speech is becoming increasingly rampant, and automatic speaker verification (ASV) systems are vulnerable to its attacks. However, most current synthetic speech detection methods focus on the influence of a single feature in the detection. Since different features can represent the difference between real speech and synthetic speech to a certain extent, there must be common information between different types of features. Effectively finding and fully utilizing this information will facilitate the extraction of better discriminative features and achieve improved performance. Based on the above analysis, we propose a deep correlation network (DCN) to learn the latent common information between different embeddings. It consists of two parts, the bi-parallel network and the correlation learning network. Bi-parallel networks consist of different neural models to learn the middle-level representations from front-end acoustical features. The correlation learning network is the core part of the DCN and is proposed to explore the common information between the above middle-level features. The common information obtained after DCN processing have better discriminative ability for synthetic speech detection. Experimental results show that the proposed DCN can significantly improve the performance of synthetic speech detection system on ASVspoof 2019 and ASVspoof 2021 logical access sub -challenge.
更多
查看译文
关键词
Synthetic speech detection,Deep correlation network,Correlation learning network,Common embedding
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要