J. Armando Barron-Lugo, J.L. Gonzalez-Compean,Ivan Lopez-Arevalo, Jesus Carretero, Jose L. Martinez-Rodriguez

Future Generation Computer Systems(2023)

引用 0|浏览5
暂无评分
摘要
This paper presents Xel , a cloud-agnostic data platform for the design-driven building of high-availability data science services as a support tool for data-driven decision-making. We designed and implemented Xel based on four main components: (a) a high level and driven-design framework for end-users to select analytic and machine learning tools from a service mesh and coupling them into the form of processing pipelines; (b) a new recursive ETL processing model to automatically convert the pipeline designs into infrastructure-agnostic software structures, which are deployed on multiple infrastructures; (c) an orchestration model for transparently managing the data delivery throughout each stage of the processing pipelines used in data science systems; and (d) a data decentralized model to transparently mask service unavailability such as cloud outages and unavailability of either applications or data. Real users created, by means of Xel , data science services such as deep learning analysis of scientific publications, clustering of movie reviews, and a cancer exploratory study. These services were evaluated as case studies that revealed the efficacy of this platform design for enabling end-users to create multiple types of data science pipelines without programming nor making configurations, and automatically masking unavailability of cloud resources and data. This platform is currently used to create a national cancer observatory and big data systems for fusing suicide, mental health, drug consumption and macroeconomic datasets to find spatiotemporal patterns. • Design-driven platform for end-users to build data science pipelines at high level. • Recursive ETL processing model converts pipelines into cloud-agnostic software. • Data orchestration model manages delivery/retrieval throughout data science systems. • A decentralized model masks unavailability of applications and data. • A Prototype converts pipelines designs into high-availability data science services.
更多
查看译文
关键词
Data-driven decision making,Big data,Data orchestration,Cloud computing,Fault tolerance,Data analytics
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要