Lahar: warehousing markovian streams

Lahar: warehousing markovian streams(2010)

引用 24|浏览10
暂无评分
摘要
A huge amount of the world's data is both sequential and low-level. Many applications consume higher-level information, such as words and sentences, that is inferred from low-level sequences such as raw audio signals using a model (e.g., a hidden Markov model). This inference process is typically statistical, resulting in high-level streams that are imprecise. Common queries on this data include sequence-finding event queries (e.g. "Find all times when the phrase 'Barack Obama...veto' occurs in the NPR news podcast from July 9."), aggregates of these event queries (e.g. "How many times do 2008 NPR podcasts use the phrase 'Barack Obama...veto'?"), and queries on the lineage of event queries (e.g. "What words appeared between the word 'Obama' and 'veto' in the previous query?"). These queries are difficult to support efficiently because of the large volumes and rich semantics of imprecise data, but they are critical for allowing applications to effectively leverage the rich information contained in these imprecise streams. In this thesis, we introduce Lahar, the first database system for a common type of imprecise, sequential model called a Markovian stream. Lahar includes algorithms for efficiently processing event queries, aggregated event queries, and event query lineage. Lahar accelerates performance and scalability using several techniques, including a set of novel Markovian stream indices and novel methods for approximating Markovian streams. Through experiments on two real-world datasets (one collected from an office-building RFID deployment and the other collected from audio pod-casts) we demonstrate that Lahar is an efficient Markovian stream warehousing system.
更多
查看译文
关键词
warehousing markovian stream,novel Markovian stream index,efficient Markovian stream warehousing,approximating Markovian stream,Markovian stream,Barack Obama,event query,event query lineage,imprecise data,aggregated event query,imprecise stream
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要