Hunting Down Glitches in Massive Time Series Data

IQ(2000)

引用 25|浏览16
暂无评分
摘要
In a previous paper [5] presented at IQ’99, we had proposed a method for isolating data glitches in massive data sets using a data mining method called DataSpheres. The technique runs in linear time, isolating sections of data that contain corrupted or abnormal data. In this paper, we propose using the DataSphere technique to isolate problems in time series data. We define two types of multivariate deviations, relative and within, in time and space for each data point. We discretize the attribute space into states and construct a one-step Markov chain model to summarize movement between the states. The relative deviation is based on low likelihood transitions and is used to flag suspicious movements. The within deviation is specific to a data point and helps us separate legitimate movements (e.g. bursty traffic) from data glitches (e.g. missing data). The methods we propose are distribution free, making them widely applicable. Furthermore, they are simple and can be computed from summaries, thus requiring very little storage. We demonstrate the method on real network data, isolating “abnormal” data movements over time. We conclude with a proposal for a set of general actions to take based upon the glitches detected by our algorithm.
更多
查看译文
关键词
time series data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要