GP with a Hybrid Tree-vector Representation for Instance Selection and Symbolic Regression on Incomplete Data

2021 IEEE Congress on Evolutionary Computation (CEC)(2021)

引用 2|浏览2
暂无评分
摘要
Data incompleteness is a pervasive problem in symbolic regression, and machine learning in general. Unfortunately, most symbolic regression methods are only applicable when the given data is complete. One common approach to handling this situation is data imputation. It works by estimating missing values based on existing data. However, which existing data should be used for imputing the missing values? The answer to this question is important when dealing with incomplete data. To address this question, this work proposes a mixed tree-vector representation for genetic programming to perform instance selection and symbolic regression on incomplete data. In this representation, each individual has two components: an expression tree and a bit vector. While the tree component constructs symbolic regression models, the vector component selects the instances that are used to impute missing values by the weighted k-nearest neighbour (WKNN) imputation method. The complete imputed instances are then used to evaluate the GP-based symbolic regression model. The obtained experimental results show the applicability of the proposed method on real-world data sets with different missingness scenarios. When compared with existing methods, the proposed method not only produces more effective symbolic regression models but also achieves more efficient imputations.
更多
查看译文
关键词
Symbolic Regression,Genetic Programming,Incomplete Data,Imputation,Instance Selection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要