Scalable Splitting of Massive Data Streams
DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PT II, PROCEEDINGS(2010)
摘要
Scalable execution of continuous queries over massive data streams often requires splitting input streams into parallel sub-streams
over which query operators are executed in parallel. Automatic stream splitting is in general very difficult, as the optimal
parallelization may depend on application semantics. To enable application specific stream splitting, we introduce splitstream
functions where the user specifies non-procedural stream partitioning and replication. For high-volume streams, the stream
splitting itself becomes a performance bottleneck. A cost model is introduced that estimates the performance of splitstream
functions with respect to throughput and CPU usage. We implement parallel splitstream functions, and relate experimental results
to cost model estimates. Based on the results, a splitstream function called autosplit is proposed, which scales well for
high degrees of parallelism, and is robust for varying proportions of stream partitioning and replication. We show how user
defined parallelization using autosplit provides substantially improved scalability (L = 64) over previously published results
for the Linear Road Benchmark.
更多查看译文
关键词
non-procedural stream partitioning,massive data stream,stream partitioning,splitstream function,stream splitting,parallelization,distributed stream systems,scalable splitting,automatic stream splitting,high-volume stream,query optimization.,application specific stream splitting,splitting input stream,parallel splitstream function,computer science,query optimization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要