Exploiting Staleness for Approximating Loads on CMPs.

Parallel Architectures and Compilation Techniques(2015)

引用 17|浏览76
暂无评分
摘要
Coherence misses are an important factor in limitingthe scalability of multi-threaded shared memory applicationson chip multiprocessors (CMPs) that are envisaged to containdozens of cores in the imminent future. This paper proposesa novel approach to tackling this problem by leveraging thegrowingly important paradigm of approximate computing. Manyapplications are either tolerant to slight errors in the output or ifstringent, have in-built resiliency to tolerate some errors in the ex-ecution. The approximate computing paradigm suggests breakingconventional barriers of mandating stringent correctness on thehardware, allowing more flexibility in the performance-power-reliability design space. Taking the multi-threaded applicationsin the SPLASH-2 benchmark suite, we note that nearly all theseapplications have such inherent resiliency and/or tolerance toslight errors in the output. Based on this observation, we proposeto approximate coherence-related load misses by returning stalevalues, i.e., the version at the time of the invalidation. We showthat returning such values from the invalidated lines alreadypresent in d-L1 offers only limited scope for improvement sincethose lines get evicted fairly soon due to the high pressure ond-L1. Instead, we propose a very small (8 lines) Stale VictimCache (SVC), to hold such lines upon d-L1 eviction. While thisdoes offer significant improvement, there is the possibility ofdata getting very stale in such a structure, making it highlysensitive to the choice of what data to keep, and for how long. Toaddress these concerns, we propose to time-out these lines fromthe SVC to limit their staleness in a mechanism called SVC+TB. We show that SVC+TB provides as much as 28.6% speedup insome SPLASH-2 applications, with an average speedup between10-15% across the entire suite, becoming comparable to an idealexecution that does not incur coherence misses. Further, theconsequent approximations have little impact on the correctness, allowing all of them to complete. There were no errors, becauseof inherent application resilience, in eleven applications, and themaximum error was at most 0.08% across the entire suite.
更多
查看译文
关键词
Approximate Computing, Coherence, Caches
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要