Investigating Factor Analysis Features For Deep Neural Networks In Noisy Speech Recognition

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5(2015)

引用 28|浏览65
暂无评分
摘要
The problem of speaker and channel adaptation in deep neural network (DNN) based automatic speech recognition (ASR) systems is of substantial interest in advancing the performance of these systems. Recently, the speaker identity vectors (i-vectors) have shown improvements for ASR systems in matched conditions. In this paper, we propose the application of the general factor analysis framework for noisy speech recognition tasks. Several methods for deriving speaker and channel factors are explored including joint factor analysis (JFA) and i-vectors derived from DNN posteriors instead of the traditional Universal background model (UBM) approach. We also experiment with the late fusion of i-vector features with bottleneck (BN) features obtained from a previously trained convolutional neural network (CNN) system. The ASR experiments are performed on the Aspire challenge test data which contains noisy far-field speech while the acoustic models are trained with conversational telephone speech (CTS) data from the Fisher corpus. In these experiments, we show that the factor analysis based methods provide significant improvements in the word error rate (relative improvements of about 11% compared to the baseline DNN system trained with speaker adapted features).
更多
查看译文
关键词
Factor analysis, Speaker and Channel Adaptation, Deep Neural Networks, Automatic Speech Recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要