A Frequency Normalization Technique for Kindergarten Speech Recognition Inspired by the Role of f(o) in Vowel Perception

INTERSPEECH(2019)

引用 6|浏览16
暂无评分
摘要
Accurate automatic speech recognition (ASR) of kindergarten speech is particularly important as this age group may benefit the most from voice-based educational tools. Due to the lack of young child speech data, kindergarten ASR systems often are trained using older child or adult speech. This study proposes a fundamental frequency (f(o))-based normalization technique to reduce the spectral mismatch between kindergarten and older child speech. The technique is based on the tonotopic distances between formants and f(o) developed to model vowel perception. This proposed procedure only relies on the computation of median f(o) across an utterance. Tonotopic distances for vowel perception were reformulated as a linear relationship between formants and f(o) to provide an effective approach for frequency normalization. This reformulation was verified by examining the formants and f(o) of child vowel productions. A 208-word ASR experiment using older child speech for training and kindergarten speech for testing was performed to examine the effectiveness of the proposed technique against piecewise vocal tract length, F3-based, and subglottal resonance normalization techniques. Results suggest that the proposed technique either has performance advantages or requires the computation of fewer parameters.
更多
查看译文
关键词
child speech recognition, frequency normalization, fundamental frequency
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要