A Comparison of Text Selection Algorithms for Sequence-to-Sequence Neural TTS

2022 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC)(2022)

引用 0|浏览1
暂无评分
摘要
Previous research demonstrated that text selection algorithms applied in the context of concatenative and parametric text-to-speech (TTS) systems were able to increase synthesis quality (i.e., intelligibility and naturalness). In this work, we investigate the effects of such algorithms on the quality of a sequence-to-sequence neural TTS system when they are used to create the training set. We compare how the mel spectrograms generated by Tacotron are affected by training sets with a total duration of six hours created using three selection approaches: random selection, greedy selection, and a modified greedy selection that attempts to produce a uniform symbol distribution. The evaluation was done objectively with the mel-cepstral distance, showing that the random approach did not give favorable results at all compared to the two greedy approaches, which in turn achieved close results to training on about three times as much data. Thereby, the second greedy approach achieved significantly better results than the first one, making it the recommended approach for the task of reading script creation. We expect these findings to help build shorter scripts and thus reduce recording costs, especially for low-resource languages for which TTS systems should be deployed.
更多
查看译文
关键词
Kullback-Leibler divergence,greedy,mel-cepstral distance,Tacotron,low-resource,seq2seq
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要