MLAAD: The Multi-Language Audio Anti-Spoofing Dataset
CoRR(2024)
摘要
Text-to-Speech (TTS) technology brings significant advantages, such as giving
a voice to those with speech impairments, but also enables audio deepfakes and
spoofs. The former mislead individuals and may propagate misinformation, while
the latter undermine voice biometric security systems. AI-based detection can
help to address these challenges by automatically differentiating between
genuine and fabricated voice recordings. However, these models are only as good
as their training data, which currently is severely limited due to an
overwhelming concentration on English and Chinese audio in anti-spoofing
databases, thus restricting its worldwide effectiveness. In response, this
paper presents the Multi-Language Audio Anti-Spoof Dataset (MLAAD), created
using 52 TTS models, comprising 19 different architectures, to generate 160.1
hours of synthetic voice in 23 different languages. We train and evaluate three
state-of-the-art deepfake detection models with MLAAD, and observe that MLAAD
demonstrates superior performance over comparable datasets like InTheWild or
FakeOrReal when used as a training resource. Furthermore, in comparison with
the renowned ASVspoof 2019 dataset, MLAAD proves to be a complementary
resource. In tests across eight datasets, MLAAD and ASVspoof 2019 alternately
outperformed each other, both excelling on four datasets. By publishing MLAAD
and making trained models accessible via an interactive webserver , we aim to
democratize antispoofing technology, making it accessible beyond the realm of
specialists, thus contributing to global efforts against audio spoofing and
deepfakes.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要