Resource sharing in modulo-scheduled reconfigurable architectures

Resource sharing in modulo-scheduled reconfigurable architectures(2011)

引用 23|浏览9
暂无评分
摘要
This dissertation explores compiler algorithms for sharing resources in coarse-grained re-configurable arrays (CGRAs). CGRAs are scalable, word-oriented architectures designed for executing high-performance computation kernels. Instead of having a single configuration like an FPGA or a program counter indexing arbitrary instruction words, a CGRA maintains several configurations on chip that are sequenced through by a modulo-counter, changing the configuration each cycle. This model of execution works well on pipelined compute-intensive loops with a large amount of instruction level parallelism and limited control flow. CGRAs provide high-efficiency, high-throughput computing, achieving an order of magnitude improvement in operations per cycle over conventional CPUs.Even with their considerable strengths, CGRAs' flexibility and efficiency can still be improved through compiler-based resource sharing. In this dissertation I propose and demonstrate the following novel sharing techniques applicable to this execution model:• Sharing Static Routing – Reducing the number of bits needed per configuration can save area and power, or be used to allow for more configurations, increasing flexibility. One way of reducing the number of control bits is to limit portions of the interconnect to a single, repeated configuration while the rest of the array is free to use multiple configurations. Towards this goal, I propose an extension to the PathFinder/QuickRoute routing algorithms for supporting sharing of statically configured pipelined routing resources in a time-multiplexed system.• Predicate Aware Sharing of Compute and Routing Resources – The basic modulo-scheduled execution model can efficiently pipeline and execute a simple loop. CGRAs often support complex control flow by reserving resources to perform all computations, and then ignoring the results of the untraversed control paths. To reduce this overhead, I propose a scalable hardware modification, hardware abstractions, and a set of Schedule/Place/Route algorithms capable of predicate-aware mapping. This system allows sharing of resources across operations executed under mutually-exclusive control flow – for example, reusing resources across then and else branches of an if construct. It achieves this sharing by exploiting otherwise wasted configuration memory.These sharing techniques provide more efficient use of CGRA resources. Sharing static routing helps reduce the large configurations needed for CGRAs. Mutual-exclusive sharing reduces the burden of control flow, which can broaden the set of applications CGRAs can accelerate, and provide some flexibility to the programmer. It allows the programmer to handle infrequent or exceptional cases directly on the accelerator without forcing portions of the accelerator to remain idle waiting for those cases. These algorithms are implemented and evaluated across a suite of benchmarks to demonstrate the benefits of sharing in CGRAs.
更多
查看译文
关键词
control flow,control bit,sharing technique,complex control flow,compiler-based resource sharing,modulo-scheduled reconfigurable architecture,applications CGRAs,Static Routing,limited control flow,Mutual-exclusive sharing,mutually-exclusive control flow
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要