Energy efficient load latency tolerance: single-thread performance for the multi-core era

Energy efficient load latency tolerance: single-thread performance for the multi-core era(2010)

引用 22|浏览28
暂无评分
摘要
Around 2003, newly activated power constraints caused single-thread performance growth to slow dramatically. The multi-core era was born with an emphasis on explicitly parallel software. Continuing to grow single-thread performance is still important in the multi-core context, but it must be done in an energy efficient way. One significant impediment to performance growth in both out-of-order and in-order processors is the long latency of last-level cache misses. Prior work introduced the idea of load latency tolerance—the ability to dynamically remove miss-dependent instructions from critical execution structures, continue execution under the miss, and re-execute miss-dependent instructions after the miss returns. However, previously proposed designs were unable to improve performance in an energy-efficient way—they introduced too many new large, complex structures and re-executed too many instructions. This dissertation describes a new load latency tolerant design that is both energy-efficient, and applicable to both in-order and out-of-order cores. Key novel features include formulation of slice re-execution as an alternative use of multi-threading support, efficient schemes for register and memory state management, and new pruning mechanisms for drastically reducing load latency tolerance’s dynamic execution overheads. Area analysis shows that energy-efficient load latency tolerance increases the footprint of an out-of-order core by a few percent, while cycle-level simulation shows that it significantly improves the performance of memory-bound programs. Energy-efficient load latency tolerance is more energy-efficient than—and synergistic with—existing performance technique like dynamic voltage and frequency scaling (DVFS).
更多
查看译文
关键词
energy efficient load latency,single-thread performance growth,multi-core era,critical execution structure,new load latency tolerant,out-of-order core,energy-efficient load latency tolerance,existing performance technique,performance growth,single-thread performance,long latency,load latency tolerance
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要