Grosbeak: A Data Warehouse Supporting Resource-Aware Incremental Computing

SIGMOD/PODS '20: International Conference on Management of Data Portland OR USA June, 2020(2020)

引用 10|浏览119
暂无评分
摘要
As the primary approach to deriving decision-support insights, automated recurring routine analytic jobs account for a major part of cluster resource usages in modern enterprise data warehouses. These recurring routine jobs usually have stringent schedule and deadline determined by external business logic, and thus cause dreadful resource skew and severe resource over-provision in the cluster. In this paper, we present Grosbeak, a novel data warehouse that supports resource-aware incremental computing to process recurring routine jobs, smooths the resource skew, and optimizes the resource usage. Unlike batch processing in traditional data warehouses, Grosbeak leverages the fact that data is continuously ingested. It breaks an analysis job into small batches that incrementally process the progressively available data, and schedules these small-batch jobs intelligently when the cluster has free resources. In this demonstration, we showcase Grosbeak using real-world analysis pipelines. Users can interact with the data warehouse by registering recurring queries and observing the incremental scheduling behavior and smoothed resource usage pattern.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要