Lemonade: A scalable and efficient Spark-based platform for data analytics.

CCGrid(2017)

引用 13|浏览43
暂无评分
摘要
Data Analytics is a concept related to pattern and relevant knowledge discovery from large amounts of data. In general, the task is complex and demands knowledge in very specific areas, such as massive data processing and parallel programming languages. However, analysts are usually not versed in Computer Science, but in the original data domain. In order to support them in such analysis, we present Lemonade --- Live Exploration and Mining Of a Non-trivial Amount of Data from Everywhere --- a platform for visual creation and execution of data analysis workflows. Lemonade encapsulates storage and data processing environment details, providing higher-level abstractions for data source access and algorithms coding. The goal is to enable batch and interactive execution of data analysis tasks, from basic ETL to complex data mining algorithms, in parallel, in a distributed environment. The current version supports HDFS (the Hadoop filesystem), local filesystems and distributed environments such as Apache Spark, the state-of-art framework for Big Data analysis.
更多
查看译文
关键词
data analytics, cloud, Spark
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要