Comparison of modern and traditional Slovak children’s speech recognition

2023 World Symposium on Digital Intelligence for Systems and Machines (DISA)(2023)

引用 0|浏览4
暂无评分
摘要
We compare two distinct speech recognition approaches, namely Hidden Markov models mixed with deep neural networks and modern end-to-end neural speech recognition architectures. Our evaluation focuses on the metrics of performance in the context of low-resource data. Here, we utilize newly completed Slovak speech recognition dataset containing children’s speech from Slovak public TV talk show intended for children. Our objective is to assess the feasibility of utilizing the Slovak children dataset within the framework of state-of-the-art end-to-end speech recognition, specifically by employing ESPnet2 framework. We anticipate the end-to-end model’s performance to be inferior in comparison with any tailored hybrid Hidden Markov model. The results from the baseline experiments motivate us to employ other advanced techniques that we propose to further enhance quality of models moving forward.
更多
查看译文
关键词
acoustic models,ASR,byte-pair encoding,children’s speech,convolution,CTC,ESPnet,GMM,HMM,Kaldi,low-resource,MLLR,SAT,Slovak,speech recognition,SpecAugment,TDNN,triphone models
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要