Audio Transformer for Synthetic Speech Detection via Formant Magnitude and Phase Analysis

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2024)

引用 0|浏览0
暂无评分
摘要
This paper introduces a novel multi-task transformer for synthetic speech detection. The network encodes magnitude and phase of the input speech with a feature bottleneck, used to autoencode the input magnitude, to predict the trajectory of the fundamental frequency (f0), and to discern if the input speech is synthetic or natural. The approach achieves state-of-the-art performance on the ASVspoof 2019 LA dataset while still retaining interpretability, with an AUC score of 0.910.
更多
查看译文
关键词
synthetic speech detection,audio deepfakes,audio transformer,voice formants
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要