Trend constrained optimal predictor discretisation for insurance and banking using a network flow approach

G. Bodenstein, Max Krüger,Humphrey Brydon,Rénette Blignaut

Research Square (Research Square)(2023)

引用 0|浏览2
暂无评分
摘要
Abstract Discretising (or binning) continuous predictors forms part of the data preparation and exploration phase when building predictive models. Discretisation involves partitioning a predictor variable's support into a set of disjunct intervals separated by cut-points. Discretisation is a valuable tool in predictive modelling and preliminary variable selection methods. It eases the interpretability and assists researchers in gauging the relationship between a predictor and the response of interest while providing automatic protection against outliers and missing values. Some machine learning and data mining techniques work better with categorical or discretized continuous variables instead of raw quantitative values. In some instances, the discretisation process must consider business, operational, or best practice constraints concerning selected intervals and parameter trends across consecutive intervals of a predictor variable. Choosing cut-points while maintaining a specified relationship trend between the discretised predictor and the response variable while adhering to additional side constraints, is difficult. The aim is to provide a concise summary of a predictor variable with as little loss of predictive power as possible while satisfying side constraints. This optimal trade-off between predictive power and arity for a constrained solution space allows the discretisation problem to be formulated as a mathematical optimisation problem. This paper provides a flexible mixed-integer programming formulation for the constrained optimal supervised discretisation problem with automatic trend detection using a novel network flow approach. The flexibility of the proposed framework allows us to implement and compare different supervised discretisation predictive performance measures on a real-world dataset.
更多
查看译文
关键词
optimal predictor discretisation,network flow approach,banking,insurance
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要