Accelerating Mapreduce On A Coupled Cpu-Gpu Architecture

2012 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC)(2012)

引用 60|浏览0
暂无评分
摘要
The work presented here is driven by two observations. First, heterogeneous architectures that integrate a CPU and a GPU on the same chip are emerging, and hold much promise for supporting power-efficient and scalable high performance computing. Second, MapReduce has emerged as a suitable framework for simplified parallel application development for many classes of applications, including data mining and machine learning applications that benefit from accelerators.This paper focuses on the challenge of scaling a MapReduce application using the CPU and GPU together in an integrated architecture. We develop different methods for dividing the work, which are the map-dividing scheme, where map tasks are divided between both devices, and the pipelining scheme, which pipelines the map and the reduce stages on different devices. We develop dynamic work distribution schemes for both the approaches. To achieve high load balance while keeping scheduling costs low, we use a runtime tuning method to adjust task block sizes for the map-dividing scheme. Our implementation of MapReduce is based on a continuous reduction method, which avoids the memory overheads of storing key-value pairs.We have evaluated the different design decisions using 5 popular MapReduce applications. For 4 of the applications, our system achieves 1.21 to 2.1 speedup over the better of the CPU-only and GPU-only versions. The speedups over a single CPU core execution range from 3.25 to 28.68. The runtime tuning method we have developed achieves very low load imbalance, while keeping scheduling overheads low. Though our current work is specific to MapReduce, many underlying ideas are also applicable towards intra-node acceleration of other applications on integrated CPU-GPU nodes.
更多
查看译文
关键词
computer architecture,graphics processing units,pipeline processing,MapReduce,continuous reduction method,coupled CPU-GPU architecture,data mining,dynamic work distribution scheme,intranode acceleration,machine learning,map-dividing scheme,parallel application,pipelining scheme,runtime tuning method,scalable high performance computing,
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要