MODE-LSTM: A Parameter-efficient Recurrent Network with Multi-Scale for Sentence Classification

Conference on Empirical Methods in Natural Language Processing(2020)

引用 7|浏览464
暂无评分
摘要
The central problem of sentence classification is to extract multi-scale n-gram features for understanding the semantic meaning of sentences. Most existing models tackle this problem by stacking CNN and RNN models, which easily leads to feature redundancy and overfitting because of relatively limited datasets. In this paper, we propose a simple yet effective model called Multi-scale Orthogonal inDependEnt LSTM (MODE-LSTM), which not only has effective parameters and good generalization ability, but also considers multiscale n-gram features. We disentangle the hidden state of the LSTM into several independently updated small hidden states and apply an orthogonal constraint on their recurrent matrices. We then equip this structure with sliding windows of different sizes for extracting multi-scale n-gram features. Extensive experiments demonstrate that our model achieves better or competitive performance against state-of-the-art baselines on eight benchmark datasets. We also combine our model with BERT to further boost the generalization performance.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要