Comparison of missing data imputation methods using weather data

PAKISTAN JOURNAL OF AGRICULTURAL SCIENCES(2023)

引用 0|浏览2
暂无评分
摘要
Researchers and data analysts commonly experience challenges while dealing with missing data for analyzing large data sets in their respective field of studies. It is necessary to handle missing data properly to obtain better and more reliable outcomes about any research. The objective of this research is to evaluate different imputation techniques for handling missing observations occurred in the weather data. For this purpose, weather data of the variables: daily rainfall, maximum temperature (Tmax) and minimum temperature (Tmin) of 23 stations of Pakistan have been taken from Pakistan Metrological department for the years 1981 to 2020. There are about 14610 total observations of each variable while each variable has different number of missing observations, called as size of missingness, at different stations. The techniques: mean imputation, k nearest neighbors (KNN) imputation, predictive mean matching (PMM) imputation and sample imputation have been considered for the estimation of missing observations found while analyzing data of each station. The minimal value of root mean square error (RMSE) is considered to decide about station-wise imputation technique because the size of missingness varied from station to station. The KNN technique is the most appropriate to estimate the missing observations of the rainfall variables for all the stations while mean imputation technique is recommended for Tmax and Tmin data; as compared to other imputation methods.
更多
查看译文
关键词
Rainfall, temperature, missing data, imputation methods, root mean square error
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要