CIRCUMFLEX: a scheduling optimizer for MapReduce workloads with shared scans.

ACM SIGOPS Operating Systems Review(2012)

引用 5|浏览100
暂无评分
摘要
We consider MapReduce clusters designed to support multiple concurrent jobs, concentrating on environments in which the number of distinct datasets is modest relative to the number of jobs. Many datasets in such scenarios wind up being scanned by multiple concurrent Map phase jobs. As has been noticed previously, this scenario provides an opportunity for Map phase jobs to cooperate , sharing the scans of these datasets, and thus reducing the costs of such scans. Our paper has two main contributions. First, we present a novel and highly general method for sharing scans and thus amortizing their costs. This concept, which we call cyclic piggybacking , has a number of advantages over the more traditional batching scheme described in the literature. Second, we describe a significant but natural generalization of the recently introduced flex scheduler, for optimizing schedules within the context of this cyclic piggybacking paradigm. The overall approach, including both cyclic piggybacking and the flex generalization, is called circumflex. We demonstrate the excellent performance of circumflex via a variety of simulation experiments.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要