Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models
arxiv(2024)
摘要
Generalization is a main issue for current audio deepfake detectors, which
struggle to provide reliable results on out-of-distribution data. Given the
speed at which more and more accurate synthesis methods are developed, it is
very important to design techniques that work well also on data they were not
trained for.In this paper we study the potential of large-scale pre-trained
models for audio deepfake detection, with special focus on generalization
ability. To this end, the detection problem is reformulated in a speaker
verification framework and fake audios are exposed by the mismatch between the
voice sample under test and the voice of the claimed identity. With this
paradigm, no fake speech sample is necessary in training, cutting off any link
with the generation method at the root, and ensuring full generalization
ability. Features are extracted by general-purpose large pre-trained models,
with no need for training or fine-tuning on specific fake detection or speaker
verification datasets. At detection time only a limited set of voice fragments
of the identity under test is required. Experiments on several datasets
widespread in the community show that detectors based on pre-trained models
achieve excellent performance and show strong generalization ability, rivaling
supervised methods on in-distribution data and largely overcoming them on
out-of-distribution data.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要