Accelerating and Scaling Data Products with Serverless

Ángel Luis Sanz Pérez, B. K. Vasil'ev,Żeljko Agić, Christoffer Thrysøe, Viktor Hargitai, Mads Dahlgaard, C. Rossel

Lecture notes on data engineering and communications technologies(2023)

引用 0|浏览2
暂无评分
摘要
Managing a comprehensive data products portfolio with scaling capabilities is one important task for an organization-wide analytics team. Those data products can be broken down into components that use serverless offerings from cloud service providers, allowing a team of modest size to manage company-wide analytics and data science solutions while improving productivity and promoting data-driven decisions. This work describes an architecture and tools used to speed up and manage data offerings, including data visualization, pipelines, models, and APIs. Considerations of component design include re-usability, integration, and maintainability, which are discussed along with their impact to team productivity. The components described are: Data ingestion using a containerized solution as a fundamental layer for all the applications, including its execution, orchestration, and monitoring. This solution is combined with traditional pipelines in order to enrich the data available; APIs for data and model serving using containerized solutions as a building block for data products that are powered by machine learning models, and for serving a unified data ontology; Data Visualization in the form of containerized web apps that provides fast solutions for data explorations, model predictions, visualization, and user insights. The architectures of three data products are then described as aggregations of the distinct building blocks (components) that were developed and how those can be repurposed for different applications. This includes continuous integration and delivery as well as pairing of each solution with the corresponding products from cloud service providers (e.g. Google Cloud Platform) in order to provide some real world examples. The scaling of each solution is discussed, as well as lessons learned and pitfalls we have encountered regarding security, usability, and maintenance. The proposed components and architecture allowed a team of 7 members to cater for analytics solutions for a section (equivalent to approx. 1500 employees), which provides a clear picture of the potential of serverless in rapid prototyping and empowering effective teams.
更多
查看译文
关键词
scaling data products
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要