Employing synthetic data for addressing the class imbalance in aspect-based sentiment classification

JOURNAL OF INFORMATION AND TELECOMMUNICATION(2023)

引用 0|浏览0
暂无评分
摘要
The class imbalance problem, in which the distribution of different classes in training data is unequal or skewed, is a prevailing problem. This can lead to classifier algorithms being biased, negatively impacting the performance of the minority class. In this paper, we addressed the class imbalance problem in datasets for aspect-based sentiment classification. Aspect-based Sentiment Classification (AbSC) is a type of fine-grained sentiment analysis in which sentiments about particular aspects of an entity are extracted. In this work, we addressed the issue of class imbalance by creating synthetic data. For synthetic data generation, two techniques have been proposed: paraphrasing using the PEGASUS fine-tuned model and backtranslation using the M2M100 neural machine translation model. We compared these techniques with two other class balancing techniques, such as weighted oversampling and cross-entropy loss with class weight. An extensive experimental study has been conducted on three benchmark datasets for restaurant reviews: SemEval-2014, SemEval-2015, and SemEval-2016. We applied these methods to the BERT-based deep learning model for aspect-based sentiment classification and studied the effect of balancing the data on the performance of these models. Our proposed balancing technique, using synthetic data, yielded better results than the other two existing methods for dealing with multi-class imbalance.
更多
查看译文
关键词
Aspect-based sentiment classification,sentiment analysis,imbalanced data,class Imbalance,paraphrasing,backtranslation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要