Large vocabulary conversational speech recognition with a subspace constraint on inverse covariance matrices
INTERSPEECH(2003)
摘要
This paper applies the recently proposed SPAM models for acoustic modeling in a Speaker Adaptive Training (SAT) con- text on large vocabulary conversational speech databases, in- cluding the Switchboard database. SPAM models are Gaus- sian mixture models in which a subspace constraint is placed on the precision and mean matrices (although this paper fo- cuses on the case of unconstrained means). They include di- agonal covariance, full covariance, MLLT, and EMLLT mod- els as special cases. Adaptation is carried out with maximum likelihood estimation of the means and feature-space under the SPAM model. This paper shows the first experimental evidence that the SPAM models can achieve significant word-error-rate improvements over state-of-the-art diagonal covariance mod- els, even when those diagonal models are given the benefit of choosing the optimal number of Gaussians (according to the Bayesian Information Criterion). This paper also is the first to apply SPAM models in a SAT context. All experiments are per- formed on the IBM "Superhuman" speech corpus which is a challenging and diverse conversational speech test set that in- cludes the Switchboard portion of the 1998 Hub5e evaluation data set.
更多查看译文
关键词
bayesian information criterion,maximum likelihood estimate,word error rate,feature space
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络