Exploring Multidimensional Lstms For Large Vocabulary Asr

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2016)

引用 61|浏览210
暂无评分
摘要
Long short-term memory (LSTM) recurrent neural networks (RNNs) have recently shown significant performance improvements over deep feed-forward neural networks. A key aspect of these models is the use of time recurrence, combined with a gating architecture that allows them to track the long-term dynamics of speech. Inspired by human spectrogram reading, we recently proposed the frequency LSTM (F-LSTM) that performs 1-D recurrence over the frequency axis and then performs 1-D recurrence over the time axis. In this study, we further improve the acoustic model by proposing a 2-D, time-frequency (TF) LSTM. The TF-LSTM jointly scans the input over the time and frequency axes to model spectro-temporal warping, and then uses the output activations as the input to a time LSTM (T-LSTM). The joint time-frequency modeling better normalizes the features for the upper layer T-LSTMs. Evaluated on a 375-hour short message dictation task, the proposed TF-LSTM obtained a 3.4% relative WER reduction over the best T-LSTM. The invariance property achieved by joint time-frequency analysis is demonstrated on a mismatched test set, where the TF-LSTM achieves a 14.2% relative WER reduction over the best T-LSTM.
更多
查看译文
关键词
LSTM,RNN,time and frequency,multidimensional
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要