Classifier Learning From Imbalanced Corpus By Autoencoded Over-Sampling

Eunkyung Park,Raymond K. Wong,Victor W. Chu

PRICAI 2019: TRENDS IN ARTIFICIAL INTELLIGENCE, PT I（2019）

引用 1|浏览16

暂无评分

摘要

Class imbalance is a common problem in classifier learning but it is difficult to solve. Textual data are ubiquitous and their analytics have great potential in many applications. In this paper, we propose a solution to build accurate sentiment classifiers from imbalanced textual data. We first establish topic vectors to capture local and global patterns from a corpus. Synthetic minority over-sampling technique is then used to balance the data while avoiding overfitting. However, we found that residue overfitting is still prominent. To address this problem, we propose an autoencoded oversampling framework to reconstruct balanced datasets. Our extensive experiments on different datasets with various imbalanced ratios and number of classes have found that our approach is sound and effective.

查看译文

关键词

Imbalanced learning, Sentiment analysis, Over-sampling, Autoencoding

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要