Deep Neural Networks Employing Multi-Task Learning And Stacked Bottleneck Features For Speech Synthesis

Zhizheng Wu,Cassia Valentini-Botinhao,Oliver Watts,Simon King

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)（2015）

引用 333|浏览119

暂无评分

摘要

Deep neural networks (DNNs) use a cascade of hidden representations to enable the learning of complex mappings from input to output features. They are able to learn the complex mapping from text-based linguistic features to speech acoustic features, and so perform text-to-speech synthesis. Recent results suggest that DNNs can produce more natural synthetic speech than conventional HMM-based statistical parametric systems. In this paper, we show that the hidden representation used within a DNN can be improved through the use of Multi-Task Learning, and that stacking multiple frames of hidden layer activations (stacked bottleneck features) also leads to improvements. Experimental results confirmed the effectiveness of the proposed methods, and in listening tests we find that stacked bottleneck features in particular offer a significant improvement over both a baseline DNN and a benchmark HMM system.

查看译文

关键词

Speech synthesis,acoustic model,multi-task learning,deep neural network,bottleneck feature

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要