Comparing Automated Machine Learning Against an Off-the-Shelf Pattern-Based Classifier in a Class Imbalance Problem: Predicting University Dropout

IEEE ACCESS(2023)

引用 0|浏览0
暂无评分
摘要
When facing a classification problem, data science practitioners must search through an armory of methods. Often, practitioners are tempted to use off-the-shelf classifiers, including automated Machine Learning (AutoML) toolboxes; however, stand-alone classifiers are not applicable to every problem and AutoML may be time-consuming raising up environment-ethical issues. To magnify the problem, (commercial) AutoML toolboxes are black and practitioners are not allowed to extend them with new methods to improve their classification performance. Our main objective is to show that an off-the-shelf classifier designed for class imbalance problems can achieve similar performance to an AutoML toolbox. To do so, first, we present the student dropout prediction case study, which most off-the-shelf classifiers find difficult to solve due to the problem's inherent class imbalance. We show that Microsoft Azure AutoML outperforms several popular, stand-alone classifiers. However, multivariate PBC4cip, an off-the-shelf classifier especially designed to deal with class imbalance, yields results that are just as good as Microsoft Azure AutoML, with the advantage that the expensive steps of mechanism selection and tuning are avoided. Our studies show that data science practitioners need to build themselves a taxonomy of classification mechanisms in terms of the properties of the problem to solve. Additionally, AutoML platforms should let scientists modify the armory of classifiers and provide an explanation of both mechanism selection and mechanism tunning so that practitioners learn further lessons.
更多
查看译文
关键词
Automated machine learning,feature selection,imbalanced classification models,student drop out,supervised classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要