Using Multiple Threads to Accelerate Single Thread Performance

Zehra Sura,Kevin O'Brien,José R. Brunheroto

Phoenix, AZ（2014）

引用 4|浏览21

暂无评分

摘要

Computing systems are being designed with an increasing number of hardware cores. To effectively use these cores, applications need to maximize the amount of parallel processing and minimize the time spent in sequential execution. In this work, we aim to exploit fine-grained parallelism beyond the parallelism already encoded in an application. We define an execution model using a primary core and some number of secondary cores that collaborate to speed up the execution of sequential code regions. This execution model relies on cores that are physically close to each other and have fast communication paths between them. For this purpose, we introduce dedicated hardware queues for low-latency transfer of values between cores, and define special \"enque\" and \"deque\" instructions to use the queues. Further, we develop compiler analyses and transformations to automatically derive fine-grained parallel code from sequential code regions. We implemented this model for exploiting fine-grained parallelization in the IBM XL compiler framework and in a simulator for the Blue Gene/Q system. We also studied the Sequoia benchmarks to determine code sections where our techniques are applicable. We evaluated our work using these code sections, and observed an average speedup of 1.32 on 2 cores, and an average speedup of 2.05 on 4 cores. Since these code sections are otherwise sequentially executed, we conclude that our approach is useful for accelerating single thread performance.

查看译文

关键词

multi-threading,parallelising compilers,program diagnostics,software performance evaluation,Blue Gene/Q system,IBM XL compiler framework,automatic fine-grained parallel code generation,code sections,compiler analysis,computing systems,deque instructions,enque instructions,execution model,fine-grained parallelism,hardware queues,low-latency value transfer,multithreading,parallel processing,sequential code region execution,sequential execution,single thread performance acceleration,time spent minimization

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要