TacoSi: A Sinhala Text to Speech System with Neural Networks

Tharuka Kasthuri Arachchige,Ruvan Weerasinghe

2023 3rd International Conference on Advanced Research in Computing (ICARC)(2023)

引用 0|浏览0
暂无评分
摘要
A piece of software that turns text into speech is referred to as text to speech or speech synthesize software. There are vast number of TTS tools available for various languages including English and Chinese. Sinhala is the first language of Sri Lanka and speaks by over 16 million people which are the major ethnic group in the country. Few researchers have tried to develop Sinhala-dependent TTS systems using traditional methods, but they required time-consuming, manually generated features, and doesn't produce natural-sounding speech. The main aim of this study is to develop a human quality Sinhala TTS tool without relying on hand-crafted features. The proposed solution TacoSi, is an algorithm based on Tacotron and was trained with raw text and audio pairs. By using the raw text as an input, TacoSi can produce human-like speeches in Sinhala. Based on 10 respondents’ ratings for generated 10 voice recordings, the proposed technique achieved a 4.39 MOS. TacoSi's intelligibility has been tested by SUS sentences-based technique and was able to achieve 84% of intelligibility score which is significantly higher than existing systems. More importantly, TacoSi can pronounce rare words (never seen in training) and comprehends most common symbols, numerical values, and abbreviations used in written Sinhala.
更多
查看译文
关键词
TTS,Sinhala,NLP,Neural Network,Deep Learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要