Napa: powering scalable data warehousing with robust query performance at Google

Hosted Content(2021)

引用 9|浏览49
暂无评分
摘要
AbstractGoogle services continuously generate vast amounts of application data. This data provides valuable insights to business users. We need to store and serve these planet-scale data sets under the extremely demanding requirements of scalability, sub-second query response times, availability, and strong consistency; all this while ingesting a massive stream of updates from applications used around the globe. We have developed and deployed in production an analytical data management system, Napa, to meet these requirements. Napa is the backend for numerous clients in Google. These clients have a strong expectation of variance-free, robust query performance. At its core, Napa's principal technologies for robust query performance include the aggressive use of materialized views, which are maintained consistently as new data is ingested across multiple data centers. Our clients also demand flexibility in being able to adjust their query performance, data freshness, and costs to suit their unique needs. Robust query processing and flexible configuration of client databases are the hallmark of Napa design.Most of the related work in this area takes advantage of full flexibility to design the whole system without the need to support a diverse set of preexisting use cases. In comparison, a particular challenge we faced is that Napa needs to deal with hard constraints from existing applications and infrastructure, so we could not do a "green field" system, but rather had to satisfy existing constraints. These constraints led us to make particular design decisions and also devise new techniques to meet the challenges. In this paper, we share our experiences in designing, implementing, deploying, and running Napa in production with some of Google's most demanding applications.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要