Lessons from debiasing data for fair and accurate predictive modeling in education.

Expert Syst. Appl.(2023)

引用 0|浏览16
暂无评分
摘要
The past few years have witnessed an explosion of attention given to the bias displayed by Machine Learning (ML) techniques towards different groups of people (e.g., female vs. male). Although ML techniques have been widely adopted in education, it remains largely unexplored that to what extent such ML bias manifests itself in this specific setting and how it can be reduced and eliminated. Given the increasing importance of ML techniques in empowering educators to teach effectively, this study aimed to quantify the characteristics of the original datasets that might be correlated with the subsequent predictive unfairness displayed by ML models. To this end, we empirically investigated two types of data biases (i.e., distribution bias and hardness bias) towards students of different sexes and first-language backgrounds across a total of five frequently-performed predictive tasks in education. Then, to improve ML fairness, we drew inspiration from the well-established research in Class Balancing Techniques (CBTs), where samples are generated/removed to alleviate the predictive disparity between different prediction classes. We proposed two simple but effective strategies to empower class balancing techniques for alleviating data biases and improving prediction fairness. Through extensive analyses and evaluations, we demonstrated that ML models may greatly improve prediction fairness (improvement up to 66%) with only a small sacrifice (less than 1%) in prediction accuracy by balancing the training data with the use of students' demographic information and the overall hardness bias measure. All data and code used in this study are publicly accessible via https://github.com/lsha49/FairEdu.
更多
查看译文
关键词
Algorithmic fairness, Data bias, Class balancing, Predictive model, Fairness-aware machine learning, Education
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要