Impact of Virtual Hadoop Cluster Scalability on The Performance of Big Data Mapreduce Applications.

Feras Al-Hawari, Khaled Tayem,Sahel Alouneh, Anass Al-Ksasbeh

Arab Conference on Information Technology(2023)

引用 0|浏览0
暂无评分
摘要
Several cloud service providers utilize Hadoop to offer big data analytics and warehousing solutions. The servers used in a cloud environment are very powerful in terms of CPUs and memory, hence the big data service providers typically deploy Hadoop on virtualized clusters to achieve better resources utilization and offer more cost-effective services. Nevertheless, server virtualization can introduce an undesired computational overhead when the resources are underutilized. Therefore, this study introduces a methodology to scale up the number of virtual machines in a virtual Hadoop cluster such that a MapReduce application would perform better when executed on the virtual cluster relatively to the application's elapsed time on a physical cluster, assuming both clusters contain equal and homogenous servers. The proposed approach supports both server and storage virtualization, thus settings like number of VMs, SAN multipathing mode, HDFS replication factor, and HDFS rack awareness are considered. The experimental results illustrate that virtual cluster scalability allowed achieving a 3.54 performance speedup for a WordCount MapReduce application with a 750 GB workload when the number of VMs on each worker server was increased from one to eight.
更多
查看译文
关键词
HDFS,YARN,big data,virtualization,scalability,cloud computing,private cloud
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要