Interconnect Emulator for Aiding Performance Analysis of Distributed Memory Applications.

ICPE(2016)

引用 5|浏览87
暂无评分
摘要
Many modern large graph and Big Data processing applications operate on datasets that do not fit into DRAM of a single machine. This leads to a design of scale-out applications, where the application dataset is partitioned and processed by a cluster of machines. Distributed memory applications exhibit complex behavior: they tend to interleave computations and communications, use bursty transfers, and utilize global synchronization primitives. This makes it difficult to analyze the impact of communication layer on the application performance and answer the questions: how interconnect latency or bandwidth characteristics may change the application performance will the application performance scale when processed by a larger system? In this work, we introduce a novel emulation framework, called InterSense, which is implemented on top of existing high-speed interconnect, such as InfiniBand, and which provides two performance knobs for changing the (today's) interconnect bandwidth and latency. This approach offers an easy-to-use framework for a sensitivity analysis of complex distributed applications to communication layer performance instead of creating customized and time-consuming application models to answer the same questions. We evaluate the emulator accuracy with popular OSU MPI benchmark suite and two clusters with different generation InfiniBand interconnects (DDR and FDR): InterSense emulates the specified andwidth and latency values with less than 2% error between the expected and measured values. To demonstrate the InterSense's ease of use, we present a case study, where we apply InterSense for sensitivity analysis of four applications and benchmarks for getting non-trivial insights.
更多
查看译文
关键词
Performance emulation, InfiniBand, MPI, distributed shared memory, benchmarking, profiling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要