Learning to Recognize Speech From Chaotically Synthesized Data

semanticscholar(2017)

引用 0|浏览0
暂无评分
摘要
Academic researchers in speech recognition have for years been stymied by a lack of large data corpora of speech audio with accurate transcripts. To combat this, we present a novel method to synthetically create an arbitrarily large dataset of transcribed speech. We designed a synthetic speech generator to create speech audio from Wikipedia articles. In order to model human variation and natural noise, we injected “chaos” into the speech generation pipeline. This newly generated corpus of chaotically synthesized speech became the input data to our end-to-end deep learning speech recognizer. Through experiments, we show that our synthetic speech data can be successfully used to augment a model trained off of the Switchboard dataset. With more computation, we hope that the synthetic speech model will be able to exceed state-of-the-art results in speech recognition.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要