Set-CNN: A text convolutional neural network based on semantic extension for short text classification

Knowledge-Based Systems(2022)

引用 15|浏览45
暂无评分
摘要
A semantic extension-based classification algorithm for short texts, i.e., Set-CNN, is proposed in this paper. The proposed Set-CNN features three aspects. First, a semantic extension mechanism based on the fast clustering algorithm is applied to enrich the features of short texts. Second, a multiple-channel convolutional framework is proposed to capture semantic features at different levels. More specifically, both ordinary 1D convolution and atrous convolution are performed on the original texts to capture local and global semantic information. Ordinary 1D convolution convolves words one by one to capture original semantic information at the word level, which can be considered local semantic information. Atrous convolution convolves an entire short text to capture the context-level information of the original text, i.e., the global semantic information. This information can offset the noise incurred by semantic extension. The convolution channel equipped with an evolved GLU takes extended short texts as the object of convolution to capture semantic information at the extended context level. In addition, it functions to mitigate vanishing of the gradient. Third, we design a multiple-channel version of Text-CNN to generate different feature maps, which capture semantic features on different scales, and provide useful information to improve the classification performance of short texts. Finally, the performance of Set-CNN is assessed extensively over 4 datasets, namely, Subj, TREC, SST-2 and the Sogou corpus. The experimental results show that Set-CNN is more effective than state-of-the-art alternatives, including CNN-VE, multichannel CNN, BERT, etc. In particular, Set-CNN exhibits excellent performance as a lightweight text classifier, with lower computational complexity than BERTbase.
更多
查看译文
关键词
Atrous convolution,Multichannel text-CNN,Semantic extension,Short text classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要