Optimal classification trees with leaf-branch and binary constraints

Enhao Liu, Tengmu Hu,Theodore T. Allen, Christoph Hermes

Computers & Operations Research(2024)

引用 0|浏览0
暂无评分
摘要
Using empirical models to predict whether sections within pipes have defects can save inspection costs and, potentially, avoid oil spills. Optimal Classification Tree (OCT) formulations offer potentially desirable combinations of interpretability and prediction accuracy on unseen pipes. Approaches based on powerful state-of-the-art OCT formulations have enabled researchers to solve decision tree problems optimally instead of using traditional sub-optimal greedy approaches. Yet, the recently proposed formulations also have limitations. Some of the most recent formulations require a large number of decision variables and constraints leading to computational inefficiencies. Previous formulations have optimal solutions with undesirable or invalid tree structures which may depend on the particular software implementation. Additionally, some formulations always grow a full tree even when desirable parsimonious tree options are available. This article proposes the Modified Optimal Classification Tree (M-OCT) formulation with novel leaf-branch-interaction constraints, which could stabilize the previous formulation and reduce the chance of invalid tree structures when generating optimal trees. By incorporating the idea of binary encoding of thresholds from a previous article, we reduce the total number of binary variables. We then extend M-OCT to construct a novel formulation called Binary Node Penalty Optimal Classification Tree (BNP-OCT) with binary splits and node complexity constraints, which support efficiency in standard branch-and-cut solvers and prevents the overfitting issue when learning the optimal tree models. We compare the proposed methods with alternatives including standard formulations using 15 standard data sets. In addition, we use 750 test cases to compare the computational stability of pre-existing formulations to those involving the proposed leaf-branch constraints. We demonstrate that the proposed formulation offers advantages in accuracy, computational efficiency, and structural stability. We also describe how the proposed methods are able to achieve 94% classification accuracy on balanced test sets for unseen pipes.
更多
查看译文
关键词
Decision tree model,Machine learning,Mixed integer optimization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要