Knowledge of accent differences can predict speech recognition errors

INTERSPEECH 2022(2022)

引用 0|浏览1
暂无评分
摘要
If accent differences can predict the type of speech recognition errors, a smaller dataset systematically representing accent differences might be sufficient and less resource intensive for adapting an automatic speech recognition (ASR) to a novel variety compared to training the ASR on a large, unsystematic dataset. However, it is not known whether ASR errors pattern according to accent differences. Therefore, we tested the performance of Google's General American (GenAm) and Standard Australian English (SAusE) ASR on both dialects using words systematically representing accent differences. Accent differences were quantified using the different number of vowel phonemes, the different phonetic quality of vowels, and differences in rhoticity (i.e., presence/absence of postvocalic /x/). Our results confirm that word recognition is significantly more accurate when ASR dialect matches the speaker dialect compared to the mismatched condition. Our results reveal that GenAm ASR is less accurate on SAusE speakers due to the higher number of vowel phonemes and the lack of postvocalic /x/ in SAusE. Thus, the data need of adapting ASR from GenAm to SAusE might be reduced by using a small dataset focusing on differences in the size of vowel inventory and in rhoticity.
更多
查看译文
关键词
automatic speech recognition, accent differences, adapting ASR to novel varieties
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要