Blhuc: Bayesian Learning Of Hidden Unit Contributions For Deep Neural Network Speaker Adaptation

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)(2019)

引用 34|浏览580
暂无评分
摘要
Speaker adaptation techniques play a key role in reducing the mismatch between speech recognition systems and target users. In order to robustly learn speaker-dependent adaptation parameters, model based DNN adaptation techniques often require a significant amount of data. For example, in the commonly used learning hidden unit contributions (LHUC) based DNN adaptation, speaker dependent high-dimensional hidden layer output scaling vectors are used. When limited adaptation data are available, the standard LHUC is prone to over-fitting and poor generalization. To address the issue, Bayesian learning of hidden unit contributions (BLHUC) is proposed in this paper. A posterior distribution over the LHUC scaling vectors is used to explicitly model the uncertainty associated with the adaptation parameters. An efficient variational inference based approach is adopted to estimate the LHUC parameter posterior distribution. Experiments conducted on a 300-hour Switchboard setup showed that the proposed BLHUC method outperformed the baseline speaker-independent DNN systems and LHUC adapted DNN systems by up to 1.4% and 1.1% absolute reductions of word error rate respectively, when only using 1 utterance of adaptation data from each speaker. Consistent performance improvements were also obtained over the baseline, LHUC adapted and LHUC SAT systems when increasing the amount of adaptation data.
更多
查看译文
关键词
Bayesian learning, LHUC, speaker adaptation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要