An AUC-maximizing classifier for skewed and partially labeled data with an application in clinical prediction modeling

KNOWLEDGE-BASED SYSTEMS(2023)

引用 0|浏览1
暂无评分
摘要
Partially labeled and skewed datasets are common in many applications including healthcare, due to the high costs and time constraints of data collection and annotation. However, training machine learning classifiers on such data can undermine their prediction performances. In this paper, we propose a novel classifier to address this problem by focusing on the Area Under the Curve (AUC), which is widely recognized as a more robust performance metric for skewed datasets than other metrics such as accuracy and error rate. We introduce a new classifier called PSVM-AUC Maximizer (PSVM-AUCMax) which is based on Proximal Support Vector Machines (PSVM) and directly maximizes a new AUC-based metric in its learning objective. PSVM-AUCMax has several merits. First, by directly integrating the maximization of the proposed AUC-based metric, PSVM-AUCMax can be proved to have the enhanced generalization capability on the partially labeled and skewed dataset. Second, it simplifies the model selection process with fewer tuning hyperparameters. Third, PSVM-AUCMax's analytical solution remains the same form as traditional PSVM, preserving its advantages such as fast incremental updating in incremental learning scenarios. The efficacy of PSVM-AUCMax has been demonstrated through extensive experiments on several public datasets and a healthcare case study using data collected at the US Mayo Clinic. In the healthcare case study, we utilized PSVM-AUCMax to develop a clinical prediction model for forecasting composite outcomes in hospitalized COVID-19 patients which yielded promising results.& COPY; 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
更多
查看译文
关键词
Proximal support vector machines,Imbalanced learning,Semi-supervised learning,AUC maximization learning,Clinical prediction modeling,COVID-19 prediction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要