Scalable Splitting of Massive Data Streams

DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PT II, PROCEEDINGS(2010)

引用 29|浏览0
暂无评分
摘要
Scalable execution of continuous queries over massive data streams often requires splitting input streams into parallel sub-streams over which query operators are executed in parallel. Automatic stream splitting is in general very difficult, as the optimal parallelization may depend on application semantics. To enable application specific stream splitting, we introduce splitstream functions where the user specifies non-procedural stream partitioning and replication. For high-volume streams, the stream splitting itself becomes a performance bottleneck. A cost model is introduced that estimates the performance of splitstream functions with respect to throughput and CPU usage. We implement parallel splitstream functions, and relate experimental results to cost model estimates. Based on the results, a splitstream function called autosplit is proposed, which scales well for high degrees of parallelism, and is robust for varying proportions of stream partitioning and replication. We show how user defined parallelization using autosplit provides substantially improved scalability (L = 64) over previously published results for the Linear Road Benchmark.
更多
查看译文
关键词
non-procedural stream partitioning,massive data stream,stream partitioning,splitstream function,stream splitting,parallelization,distributed stream systems,scalable splitting,automatic stream splitting,high-volume stream,query optimization.,application specific stream splitting,splitting input stream,parallel splitstream function,computer science,query optimization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要