On High-Latency Bowtie Data Streaming

2022 IEEE International Conference on Big Data (Big Data)(2022)

引用 0|浏览11
暂无评分
摘要
In this paper, we consider applications that read sequential data from n input files and write the result into m output files, which encompasses many types of external-memory sorting, database join/group queries, and MapReduce computation. We call this I/O model bowtie streaming and develop novel algorithms for modeling its throughput, maximizing sequential run lengths, and obtaining optimal multi-pass split/merge factors under non-trivial stream-switching (i.e., seek) delay. Based on these developments, we build a platform called Tuxedo for general bowtie computation and show that it is able to perform external-memory sorting with a million times fewer attempted seeks than Hadoop and two orders of magnitude fewer than highly optimized external-memory frameworks STXXL and nsort.
更多
查看译文
关键词
data,high-latency
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要