Memory Latency-Tolerance Approaches for Itanium Processors: Out-of-Order Execution vs. Speculative Precomputation

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture(2002)

引用 60|浏览0
暂无评分
摘要
The performance of in-order execution Itanium(TM) processors can suffer significantly due to cache misses. Two memory, latency tolerance approaches can be applied for the Itanium processors. One uses an out-of-order (000) execution core; the other assumes multithreading support and exploits cache prefetching via speculative precomputation (SP). This paper evaluates and contrasts these two approaches. In addition, this paper assesses the effectiveness of combining the two approaches. For a select set of memory-intensive programs, an in-order SMT Itanium processor using speculative precomputation can achieve performance improvement (92%) comparable to that of an out-of-order design (87%). Applying both OOO and SP yields a total performance improvement of 141% over the baseline in-order machine. OOO tends to be effective in prefetching for L1 misses; whereas SP is primarily, good at covering L2 and L3 misses. Our analysis indicates that the two approaches can be redundant or complementary, depending on the type of delinquent loads that each targets. Both approaches are effective on delinquent loads in the loop body; however only, SP is effective on delinquent loads found in loop control code.
更多
查看译文
关键词
out of order,memory latency,multi threading,out of order execution
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要