Physically-coupled replication and resynthesis
Physically-coupled replication and resynthesis(2008)
摘要
In modern digital systems, performance is increasingly determined by communication delays rather than logic delays. As a result, optimizations made by traditional logic synthesis tools tend to correlate poorly with post-layout performance. One mechanism for dealing with this disconnect is to replicate and replace multiple logic gates in the design after an initial placement. Of particular interest is the approach of Hrkic et al. in which placement-level static timing analysis is performed and timing-critical fan-in trees are induced via (temporary) replication; the tree is then optimally embedded into the layout area by a dynamic programming algorithm; cell duplication and unification is done implied by the embedding result. This method is referred to as Replication Tree Embedding. We further study on this optimization problem and propose a number of techniques aimed at more fully realizing the potential of the methodology. The main technique is rectilinear Steiner arborescence embedding which is for overcoming the limitation of the reconvergence effect in the netlist. Other techniques are fanout partitioning and cell relocation which are cognizant of both wire-length and timing impact for improved solution quality. The effect of these techniques including new replication cost computation and lower-bounding of clock period are reported. The basic framework of the replication tree enables yet more general logic optimizations while retaining tight coupling with placement—we generalize the basic replication tree embedding approach so that it also performs Remapping. Timing-critical fan-in trees are reimplemented where the degrees of freedom include functional decomposition of Look-up Tables (LUTs), subject graph covering/mapping, and physical embedding. A dynamic programming algorithm optimizes over all of these freedoms simultaneously. All simple disjoint decompositions (i.e., Ashenhurst style) are encoded in the subject tree/graph using choice nodes. At the same time, because embedding is done simultaneously, interconnect delay is directly taken into account. These frameworks are implemented in Field Programmable Gate Array (FPGA) domain. In many cases they approaches a fixed flip-flop lower-bound on achievable clock period. Promising experimental results are reported with average 17.4% (up to 38.1%) clock period reduction compared with the timing-driven placement from Versatile Place and Route (VPR) and average 6.6% (up to 17.5%) reduction compared with the replication tree embedder.
更多查看译文
关键词
basic replication tree,timing-critical fan-in tree,replication tree embedder,rectilinear Steiner arborescence embedding,general logic optimizations,physical embedding,new replication cost computation,subject tree,replication tree,Physically-coupled replication,embedding result
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络