Compiler and hardware support for reducing the synchronization of speculative threads

TACO(2008)

引用 32|浏览28
暂无评分
摘要
Thread-level speculation (TLS) allows us to automatically parallelize general-purpose programs by supporting parallel execution of threads that might not actually be independent. In this article, we focus on one important limitation of program performance under TLS, which stalls as a result of synchronizing and forwarding scalar values between speculative threads that would otherwise cause frequent data dependences and, hence, failed speculation. Using SPECint benchmarks that have been automatically transformed by our compiler to exploit TLS, we present, evaluate in detail, and compare both compiler and hardware techniques for improving the communication of scalar values. We find that through our dataflow algorithms for three increasingly aggressive instruction scheduling techniques, the compiler can drastically reduce the critical forwarding path introduced by the synchronization and forwarding of scalar values. We also show that hardware techniques for reducing synchronization can be complementary to compiler scheduling, but that the additional performance benefits are minimal and are generally not worth the cost.
更多
查看译文
关键词
automatic parallelization,program performance,hardware support,critical forwarding path,forwarding scalar value,chip-multiprocessing,additional performance benefit,scalar value,aggressive instruction scheduling technique,speculative thread,specint benchmarks,hardware technique,thread-level speculation,failed speculation,instruction scheduling,chip,thread level speculation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要