Mdrt: Multi-Domain Synthetic Speech Localization

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2024)

引用 0|浏览8
暂无评分
摘要
With recent advancements in generating synthetic speech, tools to generate high-quality synthetic speech impersonating any human speaker are easily available. Several incidents report misuse of high-quality synthetic speech for spreading misinformation and for large-scale financial frauds. Many methods have been proposed for detecting synthetic speech; however, there is limited work on localizing the synthetic segments within the speech signal. In this work, our goal is to localize the synthetic speech segments in a partially synthetic speech signal. Most existing methods for synthetic speech localization obtain features from either the time domain waveform or the spectrogram representation of the speech signal. In this work, we propose Multi-Domain ResNet Transformer (MDRT) that obtains multi-domain features from both the time domain and the spectrogram representation of a speech signal to localize synthetic speech segments. MDRT uses transformer neural networks to obtain multi-domain features and processes them using a ResNet-style neural network. We use the PartialSpoof dataset to examine the performance of MDRT on localizing synthetic speech segments of varying duration. Our results show that MDRT performs better than several existing synthetic speech localization methods.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要