Ghost Imputation: Accurately Reconstructing Missing Data Of The Off Period

IEEE Transactions on Knowledge and Data Engineering(2020)

引用 22|浏览28
暂无评分
摘要
Noise and missing data are intrinsic characteristics of real-world data, leading to uncertainty that negatively affects the quality of knowledge extracted from the data. The burden imposed by missing data is often severe in sensors that collect data from the physical world, where large gaps of missing data may occur when the system is temporarily off or disconnected. How can we reconstruct missing data for these periods? We introduce an accurate and efficient algorithm for missing data reconstruction (imputation), that is specifically designed to recover off-period segments of missing data. This algorithm, Ghost, searches the sequential dataset to find data segments that have a prior and posterior segment that matches those of the missing data. If there is a similar segment that also satisfies the constraint - such as location or time of day - then it is substituted for the missing data. A baseline approach results in quadratic computational complexity, therefore we introduce a caching approach that reduces the search space and improves the computational complexity to linear in the common case. Experimental evaluations on five real-world datasets show that our algorithm significantly outperforms four state-of-the-art algorithms with an average of 18 percent higher F-score.
更多
查看译文
关键词
Sensors,Indexes,Uncertainty,Data mining,Computational complexity,Computer science,Batteries,Imputation,multivariate,pattern mining
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要