A Region-based Training Data Segmentation Strategy to Credit Scoring

SECRYPT : PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON SECURITY AND CRYPTOGRAPHY(2022)

引用 0|浏览4
暂无评分
摘要
The rating of users requesting financial services is a growing task, especially in this historical period of the COVID-19 pandemic characterized by a dramatic increase in online activities, mainly related to e-commerce. This kind of assessment is a task manually performed in the past that today needs to be carried out by automatic credit scoring systems, due to the enormous number of requests to process. It follows that such systems play a crucial role for financial operators, as their effectiveness is directly related to gains and losses of money. Despite the huge investments in terms of financial and human resources devoted to the development of such systems, the state-of-the-art solutions are transversally affected by some well-known problems that make the development of credit scoring systems a challenging task, mainly related to the unbalance and heterogeneity of the involved data, problems to which it adds the scarcity of public datasets. The Region-based Training Data Segmentation (RTDS) strategy proposed in this work revolves around a divide-and-conquer approach, where the user classification depends on the results of several sub-classifications. In more detail, the training data is divided into regions that bound different users and features, which are used to train several classification models that will lead toward the final classification through a majority voting rule. Such a strategy relies on the consideration that the independent analysis of different users and features can lead to a more accurate classification than that offered by a single evaluation model trained on the entire dataset. The validation process carried out using three public real-world datasets with a different number of features. samples, and degree of data imbalance demonstrates the effectiveness of the proposed strategy. which outperforms the canonical training one in the context of all the datasets.
更多
查看译文
关键词
Business Intelligence, Decision Support System, Risk Assessment, Credit Scoring, Machine Learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要