A hybrid method for missing value imputation

Proceedings of the 23rd Pan-Hellenic Conference on Informatics(2019)

引用 2|浏览5
暂无评分
摘要
Missing values are a common incurrence in a great number of real-world datasets, emerging from diverse domains of interest. In research, missing data constitute a significant problem as it can affect the conclusions drawn from them. Considering this, the difficulty of data preprocessing is increasing as selecting an inappropriate way to handle missing information can lead to untrustworthy results. Unfortunately, like in most cases in Machine Learning, there is not a single solution that fits in every task related to the problem. For this reason, many strategies have been proposed to successfully deal with this issue. One of the most well-known, besides efficient, is imputation. Replacing a missing value with an estimation apparently eliminates the problem and provides complete datasets but the difficulty shifts in selecting the right method to impute missing values. A widely used imputation method that can be found in libraries of the most noted statistical and Machine Learning suites is IRMI. In this work, we propose a variant of IRMI in order to maintain the advantages of this famous imputation method, while outperforming its traditional variant used in many Machine Learning software tools. To achieve this, the benefits of boosting as well as decision tree theory are exploiting. To test the efficiency of our method, a series of experiments over 30 datasets was executed, measuring the classification accuracy of the proposed method to prove that outperforms its rivals, which include classic, as well as more sophisticated imputation strategies. Finally, the results of our study are provided, along with the conclusions that arise from them.
更多
查看译文
关键词
data preprocessing, imputation strategies, machine learning, missing values imputation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要