Approximation trade-offs in Markovian stream processing: An empirical study

Data Engineering(2010)

引用 21|浏览14
暂无评分
摘要
A large amount of the world's data is both sequen- tial and imprecise. Such data is commonly modeled as Markovian streams; examples include words/sentences inferred from raw audio signals, or discrete location sequences inferred from RFID or GPS data. The rich semantics and large volumes of these streams make them difficult to query efficiently. In this paper, we study the effects—on both efficiency and accuracy—of two common stream approximations. Through experiments on a real- world RFID data set, we identify conditions under which these approximations can improve performance by several orders of magnitude, with only minimal effects on query results. We also identify cases when the full rich semantics are necessary. In general, model choice has a significant impact on DBMS quality and performance: increased model complexity yields higher fidelity to the underlying data and thus higher accuracy, but incurs additional computational and I/O costs. These high costs naturally raise the question of whether highly sophisti- cated imprecise sequence models are worthwhile. Would ap- plications notice a difference in result quality if rich, imprecise streams were approximated using simple, deterministic ones? What performance benefits could be gained from such an approximation, which would allow high-performance, deter- ministic stream processing engine to be leveraged? How might a system achieve a flexible trade-off between the accuracy and efficiency of imprecise sequence processing? In this paper, we study the performance and accuracy trade- offs of three common stream models: MAP (a deterministic model), independence (an uncorrelated model), and Markovian (a temporally-correlated model). We perform our study in the context of two common types of sequence queries—event queries and aggregated variants—and report results obtained using real-world location sequences inferred from an office- building RFID deployment. Using the Markovian model to provide an accuracy baseline, we find that the independence approximation does not always yield higher accuracy than the simpler MAP approximation, despite its increased expressiveness. The accuracy of the two approximate models varies significantly based on query characteristics (described in Section II-B). The performance of all three models is as expected; independence and MAP approximations accelerate query processing by one and two orders of magnitude, respectively, with respect to baseline performance on a Markovian model.
更多
查看译文
关键词
Markov processes,approximation theory,data handling,query processing,GPS data,Markovian stream processing,RFID data set,stream approximation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要