Cross-Lingual Subspace Gaussian Mixture Models for Low-Resource Speech Recognition

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)（2014）

引用 29|浏览60

暂无评分

摘要

This paper studies cross-lingual acoustic modeling in the context of subspace Gaussian mixture models (SGMMs). SGMMs factorize the acoustic model parameters into a set that is globally shared between all the states of a hidden Markov model (HMM) and another that is specific to the HMM states. We demonstrate that the SGMM global parameters are transferable between languages, particularly when the parameters are trained multilingually. As a result, acoustic models may be trained using limited amounts of transcribed audio by borrowing the SGMM global parameters from one or more source languages, and only training the state-specific parameters on the target language audio. Model regularization using ℓ1-norm penalty is shown to be particularly effective at avoiding overtraining and leading to lower word error rates. We investigate maximum a posteriori (MAP) adaptation of subspace parameters in order to reduce the mismatch between the SGMM global parameters of the source and target languages. In addition, monolingual and cross-lingual speaker adaptive training is used to reduce the model variance introduced by speakers. We have systematically evaluated these techniques by experiments on the GlobalPhone corpus.

查看译文

关键词

model variance,acoustic modeling,maximum a posteriori adaptation,cross-lingual subspace gaussian mixture,hmm states,cross-lingual acoustic modeling,sgmm global parameter,speech recognition,acoustic model,cross-lingual speech recognition,subspace parameters,hmm state,maximum likelihood estimation,globalphone corpus,ℓ1-norm penalty,sgmm global parameters,acoustic model parameter,source languages,map adaptation,subspace gaussian mixture model,word error rates,cross-lingual subspace gaussian mixture model,low-resource speech recognition,model regularization,adaptation,hidden markov model,gaussian processes,cross-lingual speaker adaptive training,target language audio,regularization,monolingual speaker adaptive training,hidden markov models,transcribed audio,model variance reduction,training data,speech,acoustics,data models

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要