Whole Sentence Neural Language Models

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)(2018)

引用 29|浏览134
暂无评分
摘要
Recurrent neural networks have become increasingly popular for the task of language modeling achieving impressive gains in state-of-the-art speech recognition and natural language processing (NLP) tasks. Recurrent models exploit word dependencies over a much longer context window (as retained by the history states) than what is feasible with n-gram language models. However the training criterion of choice for recurrent language models continues to be the local conditional likelihood of generating the current word given the (possibly long) word context, thus making local decisions at each word. This locally-conditional design fundamentally limits the ability of the model in exploiting whole sentence structures. In this paper, we present our initial results at whole sentence neural language models which assign a probability to the entire word sequence. We extend the previous work on whole sentence maximum entropy models to recurrent language models while using Noise Contrastive Estimation (NCE) for training, as these sentence models are fundamentally unnormalizable. We present results on a range of tasks: from sequence identification tasks such as, palindrome detection to large vocabulary automatic speech recognition (LVCSR) and demonstrate the modeling power of this approach.
更多
查看译文
关键词
Whole-sentence language models, Unnormalized models, Recurrent neural network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要