JS4Cloud: script‐based workflow programming for scalable data analysis on cloud platforms

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE(2015)

引用 41|浏览61
暂无评分
摘要
Workflows are an effective paradigm to model complex data analysis processes, such as knowledge discovery in databases applications, which can be efficiently executed on distributed computing systems such as a Cloud platform. Data analysis workflows can be designed through visual programming, which is a convenient design approach for high-level users. On the other hand, script-based workflows are a useful alternative to visual workflows, because they allow expert users to program complex applications more effectively. In order to provide Cloud users with an effective script-based data analysis workflow formalism, we designed the JS4Cloud language. The main benefits of JS4Cloud are as follows: (i) it extends the well-known JavaScript language while using only its basic functions (arrays, functions, and loops); (ii) it implements both a data-driven task parallelism that automatically spawns ready-to-run tasks to the Cloud resources and data parallelism through an array-based formalism; and (iii) these two types of parallelism are exploited implicitly so that workflows can be programmed in a fully sequential way, which frees users from duties like work partitioning, synchronization, and communication. We describe how JS4Cloud has been integrated within the data mining cloud framework (DMCF), a system supporting the scalable execution of data analysis workflows on Cloud platforms. In particular, we describe how data analysis workflows modeled as JS4Cloud scripts are processed by DMCF by exploiting parallelism to enable their scalable execution on Clouds. Finally, we present some data analysis workflows developed with JS4Cloud and the performance results obtained by executing such workflows on DMCF. Copyright (c) 2015John Wiley & Sons, Ltd.
更多
查看译文
关键词
JS4Cloud,data analysis,workflows,cloud computing,scalability
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要