Automatic pronunciation scoring for language instruction.

H. France,L. Neumeyer, Y. Kim,O. Ronen

ICASSP(1997)

引用 276|浏览38
暂无评分
摘要
This work is part of an effort aimed at developing computer- based systems for language instruction; we address the task of grading the pronunciation quality of the speech of a student of a foreign language. The automatic grading system uses SRI's Decipher™ continuous speech recognition system to generate phonetic segmentations. Based on these segmentations and probabilistic models we produce pronunciation scores for individual or groups of sentences. Scores obtained from expert human listeners are used as the reference to evaluate the different machine scores and to provide targets when training some of the algorithms. In previous work (1) we had found that duration- based scores outperformed HMM log-likelihood-based scores. In this paper we show that we can significantly improve HMM- based scores by using average phone segment posterior probabilities. Correlation between machine and human scores went up from r=0.50 with likelihood-based scores to r=0.88 with posterior-based scores. The new measures also outperformed duration-based scores in their ability to produce reliable scores from only a few sentences. to obtain spectral match and duration scores. The effectiveness of the different machine scores is evaluated based on their correlation with expert human scores on a large database. Previous approaches were based on statistical models built for specific sentences (5). The current algorithms were designed to produce pronunciation scores for arbitrary sentences, that is, sentences for which there is no acoustic training data (1). This approach allows great flexibility in the design of language instruction systems because new pronunciation exercises can be added without retraining the scoring system. We extend previous work (1) by introducing a new HMM-based score based on phone posterior probabilities. The level of human- machine correlation for this new score was significantly better than both likelihood and duration scores for the case of sentence specific scoring. When averaging scores across several sentences corresponding to a given speaker to obtain speaker-level scores we found that the new method required fewer sentences to achieve a similar level of correlation. We also investigated the combination of different machine scores to obtain a higher level of correlation. We experimented with linear and nonlinear regression as well as with an estimation-ba sed approach to predict human scores from machine scores.
更多
查看译文
关键词
computer aided instruction,correlation methods,hidden Markov models,probability,speech intelligibility,speech processing,speech recognition,HMM log-likelihood based scores,SRI Decipher,algorithms training,automatic grading system,automatic pronunciation scoring,average phone segment posterior probabilities,computer based systems,continuous speech recognition system,correlation,duration based scores,expert human listeners,foreign language student,language instruction,machine scores,phonetic segmentations,probabilistic models,pronunciation quality,pronunciation scores,sentences,speech quality
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要