A Bangla Text-to-Speech System using Deep Neural Networks

2019 International Conference on Bangla Speech and Language Processing (ICBSLP)(2019)

引用 3|浏览0
暂无评分
摘要
We present a Deep Neural Network (DNN) based statistical parametric Text-to-Speech (TTS) system for Bangla (also known as Bengali). A first step in building a DNN-based TTS system is having large speech data. Since good speech dataset for Bangla TTS is not available publicly, we created our own dataset for our system. We prepared a phonetically rich studio-quality speech database containing more than 40 hours of speech. The database consists of 12,500 utterances. We also prepared a pronunciation dictionary (lexicon) of 1,35,000 words for front-end text processing, which, to our knowledge, is the largest lexicon for Bangla. Our system extracts linguistic features from input text. Then it uses deep neural networks for mapping these linguistic features to acoustic features. We developed two TTS voices using our dataset - one male and one female voice. Both objective and subjective evaluation tests show that our system performs significantly better than the traditional Bangla TTS systems and is comparable to the commercially available best Bangla TTS system.
更多
查看译文
关键词
spss,dnn,bangla speech corpus,lexicon,open source
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要