Improving Automatic Speech Recognition by Classifying Adult and Child Speakers into Separate Groups using Speech Rate Rhythmicity Parameter

2020 International Conference on Signal Processing and Communications (SPCOM)(2020)

引用 2|浏览8
暂无评分
摘要
When children's speech is transcribed using acoustic models trained on adults' data, a severely degraded recognition performance is obtained. Similar degradations are noted on recognizing adults' speech using an automatic speech recognition (ASR) system trained on children's speech. This problem can be overcome by using two separate ASR systems for the two groups of speakers. But this approach requires an effective technique to detect whether the given data is from adult or child speaker. In this paper, we present a very simple and novel technique to do the same. The proposed approach is based on speechrate rhythmicity parameter (SRRP). Since the speaking-rates for adults and children differ significantly, the SRRP values are also very different for the two groups of speakers. Hence, by computing the SRRP value for a given speech utterance, it can be easily determined whether it is from adult or child speaker. The corresponding ASR systems can then be used to achieve improved recognition performance. Alternatively, existing techniques for improving children's speech recognition on adult data trained systems can be directly applied once it is known that the data is from a child speaker. Both these aspects have been experimentally validated in this work.
更多
查看译文
关键词
Speech recognition,children’s speech recognition,speaking-rate,speech-rate rhythmicity parameter.
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要