Articulatory Copy Synthesis Based on the Speech Synthesizer VocalTractLab and Convolutional Recurrent Neural Networks

IEEE/ACM Transactions on Audio, Speech, and Language Processing(2024)

引用 0|浏览0
暂无评分
摘要
Articulatory copy synthesis (ACS) refers to the synthetic reproduction of natural utterances. The existing methods of ACS have the limitations of poor generalizability for unknown speakers, high computing costs, the lack of systematic evaluation, etc. Here we propose an ACS method based on the articulatory speech synthesizer VocalTractLab (VTL) and convolutional recurrent neural networks. We first created paired articulatory-acoustic samples using VTL, and then trained neural-network-based ACS models with acoustic features and articulatory trajectories as inputs and outputs, respectively. The basic approach for training relied on fully synthetic training data (and was later supplemented with natural speech and corresponding synthetic articulatory data). In addition, to represent as much of the articulatory and acoustic space as possible, the training samples were augmented by varying the phonation type, speaking effort, and the vocal tract length of the synthetic utterances. Furthermore, two regularization methods were proposed: one based on the smoothness loss of articulatory trajectories and another based on the acoustic loss between original and estimated acoustic features. For given new utterances of arbitrary length, the trained ACS models could estimate articulatory trajectories that were then fed into VTL to synthesize new speech. Experiments showed that our proposed ACS method achieved an average correlation coefficient of 0.983 between the reference and estimated VTL articulatory parameters for speaker-dependent German utterances. When applied to speaker-independent German, English, and Mandarin Chinese utterances, the copy-synthesized speech achieved recognition rates of 73.88%, 52.92%, and 52.41%, respectively, using the automatic speech recognizer Google Speech-to-Text.
更多
查看译文
关键词
Speech inversion,copy synthesis,articulatory synthesis,VocalTractLab,convolutional recurrent neural networks
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要