CFSE: a Chinese short text classification method based on character frequency sub-word enhancement

Connection Science(2023)

Cited 0|Views12
No score
Abstract
As a foundation task of natural language processing, text classification is widely used in information retrieval, public opinion analysis, and other related tasks. Facing the problem of sparse features of Chinese short texts, which affects the classification accuracy of Chinese short texts, this paper proposes a Chinese short text classification method based on the Character Frequency Sub-word Enhancement (CFSE), which can effectively improve the classification accuracy of Chinese short texts. First, the initial Chinese-character sequence is mapped to the corresponding Character Frequency Sub-word (CFS) sequence based on the global character1 frequency information. Second, the relationship features among data are extracted based on BiLSTM-Att processing CFS sequence, and the semantic features of the initial Chinese-character sequence are obtained through ERNIE. Finally, these two kinds of features are fused and input into the text classifier to obtain the classification results. Experimental results show that the proposed method can improve the classification accuracy of Chinese short texts.
More
Translated text
Key words
text classification,chinese short text,character frequency sub-word,relationship features
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined