DynMR: dynamic MapReduce with ReduceTask interleaving and MapTask backfilling

EUROSYS(2014)

引用 44|浏览0
暂无评分
摘要
ABSTRACTIn order to improve the performance of MapReduce, we design DynMR. It addresses the following problems that persist in the existing implementations: 1) difficulty in selecting optimal performance parameters for a single job in a fixed, dedicated environment, and lack of capability to configure parameters that can perform optimally in a dynamic, multi-job cluster; 2) long job execution resulting from a task long-tail effect, often caused by ReduceTask data skew or heterogeneous computing nodes; 3) inefficient use of hardware resources, since ReduceTasks bundle several functional phases together and may idle during certain phases. DynMR adaptively interleaves the execution of several partially-completed ReduceTasks and backfills MapTasks so that they run in the same JVM, one at a time. It consists of three components. 1) A running ReduceTask uses a detection algorithm to identify resource underutilization during the shuffle phase. It then gives up the allocated hardware resources efficiently to the next task. 2) A number of ReduceTasks are gradually assembled in a progressive queue, according to a flow control algorithm in runtime. These tasks execute in an interleaved rotation. Additional ReduceTasks can be inserted adaptively to the progressive queue if the full fetching capacity is not reached. MapTasks can be back-filled therein if it is still underused. 3) Merge threads of each ReduceTask are extracted out as standalone services within the associated JVM. This design allows the data segments of multiple partially-complete ReduceTasks to reside in the same JVM heap, controlled by a segment manager and served by the common merge threads. Experiments show 10% ~ 40% improvements, depending on the workload.
更多
查看译文
关键词
reducetask interleaving,dynmr adaptively,dynamic mapreduce,partially-completed reducetasks,multiple partially-complete reducetasks,hardware resource,maptask backfilling,progressive queue,additional reducetasks,reducetasks bundle,associated jvm,jvm heap,reducetask data skew
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要