Reducing Misclassification Due to Overlapping Classes in Text Classification via Stacking Classifiers on Different Feature Subsets

Lecture Notes in Networks and SystemsAdvances in Information and Communication(2022)

引用 1|浏览1
暂无评分
摘要
Correct classification of customer support tickets or complaints can help companies to improve the quality of their services to the customers. One of the challenges in text classification is when certain classes tend to share the same vocabulary. This can result in misclassification by the machine learning algorithm used. The problem is worsened when the dataset is imbalanced. To address this issue, we propose a stacking algorithm based on combining different selected classifiers that operate on different feature subsets; depending on those features that tend to improve the recall and the precision of the overlapped classes. In our approach, first, we train different linear and non-linear classifiers on the full feature set. Second, we use the Chi2 test to determine the best feature set for all our pre-trained classifiers that improve the f1-score for the overlapped class(es). Finally, we train a two-layered stacked model composed of the best base learners obtained from the first step as layer-1 and combine it with a strong meta-learner for the second layer. The experimental results on a real-world dataset from a large IT organization and a public Consumer Complaint database show an improvement in the overall accuracy as well as a reduction in the misclassification rate for the overlapped classes.
更多
查看译文
关键词
stacking classifiers,text misclassification,overlapping classes
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要