An Exploration of Online Missing Value Imputation in Non-stationary Data Stream

SN Comput. Sci.(2021)

引用 4|浏览7
暂无评分
摘要
Missing value imputation (MVI) is an important data preprocessing technique. In previous decades, MVI technique has been widely studied as well as most MVI approaches have been proposed by means of either statistics or machine learning techniques. However, all previous methods only focus on the static data, but ignore the imputation for the dynamic online data. It is intuitionistic that the imputation errors may be significantly increased when there exists concept drifts in the data stream. In this paper, we investigate the impact of adopting the conventional MVI methods in non-stationary data stream. Meanwhile, two slide time window-based strategies are proposed to alleviate this impact, where one is the plain average strategy, and the other is the logarithmic weighted average strategy that gradually adds the weights of instances along the time axis. Combining with the proposed strategies, three popular MVI techniques, mean imputation (MI), KNN imputation (KNNI) and the Bayesian principal component analysis imputation (BPCAI) are adopted, to indicate the effect of the strategies are irrelevant to the specific MVI technique. The experimental results on three different types' concept drift synthetic data sets and two real-world drifting data sets have presented the effectiveness and feasibility of the proposed strategies. Moreover, the impact of time window size has also been investigated for guiding the parameter settings in future practical applications.
更多
查看译文
关键词
Missing value imputation,Data stream,Concept drift,Slide time window,Weighting
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要