Provisioning and Evaluating Multi-domain Networked Clouds for Hadoop-based Applications

Cloud Computing Technology and Science(2011)

引用 34|浏览2
暂无评分
摘要
This paper presents the design, implementation, and evaluation of a new system for on-demand provisioning of Hadoop clusters across multiple cloud domains. The Hadoop clusters are created "on-demand" and are composed of virtual machines from multiple cloud sites linked with bandwidth-provisioned network pipes. The prototype uses an existing federated cloud control framework called Open Resource Control Architecture (ORCA), which orchestrates the leasing and configuration of virtual infrastructure from multiple autonomous cloud sites and network providers. ORCA enables computational and network resources from multiple clouds and network substrates to be aggregated into a single virtual "slice" of resources, built to order for the needs of the application. The experiments examine various provisioning alternatives by evaluating the performance of representative Hadoop benchmarks and applications on resource topologies with varying bandwidths. The evaluations examine conditions in which multi-cloud Hadoop deployments pose significant advantages or disadvantages during Map/Reduce/Shuffle operations. Further, the experiments compare multi-cloud Hadoop deployments with single-cloud deployments and investigate Hadoop Distributed File System (HDFS) performance under varying network configurations. The results show that networked clouds make cross-cloud Hadoop deployment feasible with high bandwidth network links between clouds. As expected, performance for some benchmarks degrades rapidly with constrained inter-cloud bandwidth. MapReduce shuffle patterns and certain Hadoop Distributed File System (HDFS) operations that span the constrained links are particularly sensitive to network performance. Hadoop's topology-awareness feature can mitigate these penalties to a modest degree in these hybrid bandwidth scenarios. Additional observations show that contention among co-located virtual machines is a source of irregular performance for Hadoop applications on virtual cloud infrastructure.
更多
查看译文
关键词
bandwidth-provisioned network pipe,multi-cloud hadoop deployment,high bandwidth network link,multi-domain networked clouds,certain hadoop,hadoop application,network performance,representative hadoop benchmarks,file system,hadoop deployment,hadoop-based applications,hadoop cluster,virtual machine,bandwidth,virtual machines,prototypes,topology,network provisioning,distributed file system,cloud computing,benchmark testing,network topology
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要