A Sentence Vector Based Over- Sampling Method For Imbalanced Emotion Classification

COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, CICLING 2014, PART II(2014)

引用 14|浏览1
暂无评分
摘要
Imbalanced training data poses a serious problem for supervised learning based text classification. Such a problem becomes more serious in emotion classification task with multiple emotion categories as the training data can be quite skewed. This paper presents a novel over-sampling method to form additional sum sentence vectors for minority classes in order to improve emotion classification for imbalanced data. Firstly, a large corpus is used to train a continuous skip-gram model to form each word vector using word/POS pair as the unit of word vector. The sentence vectors of the training data are then constructed as the sum vector of their word/POS vectors. The new minority class training samples are then generated by randomly add two sentence vectors in the corresponding class until the training samples for each class are the same so that the classifiers can be trained on fully balanced training dataset. Evaluations on NLP&CC2013 Chinese micro blog emotion classification dataset shows that the obtained classifier achieves 48.4% average precision, an 11.9 percent improvement over the state-of-art performance on this dataset (at 36.5%). This result shows that the proposed over-sampling method can effectively address the problem of data imbalance and thus achieve much improved performance for emotion classification.
更多
查看译文
关键词
Emotion classification, Imbalanced training data, Over-sampling, Sentence vector
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要