An Experimental Comparison of Iterative MapReduce Frameworks

ACM International Conference on Information and Knowledge Management(2016)

引用 14|浏览150
暂无评分
摘要
MapReduce has become a dominant framework in big data analysis, and thus there have been significant efforts to implement various data analysis algorithms in MapReduce. Many data analysis algorithms are inherently iterative, repeating the same set of tasks until a convergence. To efficiently support iterative algorithms at scale, a few variants of Hadoop and new platforms have been proposed and actively developed in both academia and industry. Representative systems include HaLoop, iMapReduce, Twister, and Spark. In this paper, we experimentally compare Hadoop and the aforementioned systems using various workloads and metrics. The five systems are compared through four iterative algorithms---PageRank, recursive query, k-means, and logistic regression---on 50 Amazon EC2 machines (200 cores in total). We thoroughly explore the effectiveness of their new caching, communication, and scheduling mechanisms in support of iterative computation. Our evaluation also shows the performance depending on data skewness and memory residency. Overall, we believe that our evaluation and interpretation will be useful for designing a new framework or improving the existing ones.
更多
查看译文
关键词
Iterative algorithms,MapReduce,Hadoop,Spark,Benchmark
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要