Computation Reuse in Analytics Job Service at Microsoft.

SIGMOD/PODS '18: International Conference on Management of Data Houston TX USA June, 2018(2018)

引用 47|浏览92
暂无评分
摘要
Analytics-as-a-service, or analytics job service, is emerging as a new paradigm for data analytics, be it in a cloud environment or within enterprises. In this setting, users are not required to manage or tune their hardware and software infrastructure, and they pay only for the processing resources consumed per job. However, the shared nature of these job services across several users and teams leads to significant overlaps in partial computations, i.e., parts of the processing are duplicated across multiple jobs, thus generating redundant costs. In this paper, we describe a computation reuse framework, coined CLOUDVIEWS, which we built to address the computation overlap problem in Microsoft's SCOPE job service. We present a detailed analysis from our production workloads to motivate the computation overlap problem and the possible gains from computation reuse. The key aspects of our system are the following: (i) we reuse computations by creating materialized views over recurring workloads, i.e., periodically executing jobs that have the same script templates but process new data each time, (ii) we select the views to materialize using a feedback loop that reconciles the compile-time and run-time statistics and gathers precise measures of the utility and cost of each overlapping computation, and (iii) we create materialized views in an online fashion, without requiring an offline phase to materialize the overlapping computations.
更多
查看译文
关键词
Materialized Views,Computation Reuse,Shared Clouds
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要