Design and evaluation of a novel dataflow based bigdata solution.

PPOPP(2015)

引用 2|浏览44
暂无评分
摘要
ABSTRACTAs the attention given to big data grows, cluster computing systems for distributed processing of large data sets become the mainstream and critical requirement in high performance distributed system research. One of the most successful system is Hadoop which uses MapReduce as programming/execution model and takes disks as intermedia to process huge volume of data. However, currently, it is a consensus that Hadoop is not the final solution to BigData due to MapReduce programming model, disk based computing, synchronous execution model and the constraint that only supports batch processing, and so on. A new solution, especially, a fundamental evolution is needed to bring BigData solution into a new era. In this paper, we introduce a new cluster computing system called HAMR which supports both batch and streaming processing. To achieve better performance, HAMR integrates HPC approaches, i.e. DataFlow fundamental into a big data solution. With more speicifications, HAMR is fully designed based on In-Memory computing to reduce the unnecessary disk access overhead; task scheduling and memory management are in fine-grain manner to explore more parallelism; asynchronous execution improves efficiency of computation resource usage, and furtherly makes workload balance across the whole cluster better. The experimental results show that HAMR can outperform Hadoop by up to 10x in the same cluster environment.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要