Simplified LSTMS for Speech Recognition

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)(2019)

引用 1|浏览174
暂无评分
摘要
In this paper we explore new variants of Long Short-Term Memory (LSTM) networks for sequential modeling of acoustic features. In particular, we show that: (i) removing the output gate, (ii) replacing the hyperbolic tangent nonlinearity at the cell output with hard tanh, and (iii) collapsing the cell and hidden state vectors leads to a model that is conceptually simpler than and comparable in effectiveness to a regular LSTM for speech recognition. The proposed model has 25% fewer parameters than an LSTM with the same number of cells, trains faster because it has larger gradients leading to larger steps in weight space, and reaches a better optimum because there are fewer nonlinearities to traverse across layers. We report experimental results for both hybrid and CTC acoustic models on three publicly available English datasets: Switchboard 300 hours telephone conversations, 400 hours broadcast news transcription, and the MALACH 176 hours corpus of Holocaust survivor testimonies. In all cases the proposed models achieve similar or better accuracy than regular LSTMs while being conceptually simpler.
更多
查看译文
关键词
LSTM,recurrent neural networks,conversational speech recognition,broadcast news transcription
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要