Integrated Adaptation With Multi-Factor Joint-Learning For Far-Field Speech Recognition

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2016)

引用 21|浏览47
暂无评分
摘要
Although great progress has been made in automatic speech recognition (ASR), significant performance degradation still exists in distant talking scenarios due to significantly lower signal power. In this paper, a novel adaptation framework, named integrated adaptation with multi-factor joint-learning, is proposed to improve the recognition accuracy for distant speech recognition. We explore and extract speaker, phone and environment factor representations using deep neural networks (DNNs), which are integrated into the main ASR DNN to improve classification accuracy. In addition, the hidden activations in the main ASR DNN are used to improve the factor extraction, which in turn helps the ASR DNN. All the model parameters, including those in the ASR DNN and factor extractor DNNs, are jointly optimized under the multi-task learning framework. Further more, unlike prior techniques, our novel approach requires no explicit separate stages for factor extraction and adaptation. Experiments on the AMI single distant microphone (SDM) task show that the proposed architecture can significantly reduce word error rate (WER) and additional improvement can be achieved by combining it with the i-vector adaptation. Our best configuration obtained more than 15% and 10% relative reduction on WER over the baselines using the SDM and close-talk data generated alignments, respectively.
更多
查看译文
关键词
Far-field speech recognition,Deep neural network,Factor representation,Multi-task learning,Integrated adaptation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要