# DRiLLS - Deep Reinforcement Learning for Logic Synthesis

ASP-DAC, pp. 581-586, 2020.

EI

Weibo:

Abstract:

Logic synthesis requires extensive tuning of the synthesis optimization flow where the quality of results (QoR) depends on the sequence of optimizations used. Efficient design space exploration is challenging due to the exponential number of possible optimization permutations. Therefore, automating the optimization process is necessary. I...More

Code:

Data:

Introduction

- Logic synthesis transforms a high-level description of a design into an optimized gate-level representation.
- Modern logic synthesis tools represent a given design as an And-Inverter Graph (AIG), which encodes representative characteristics for optimizing Boolean functions.
- Logic Synthesis mainly consists of three tightly-coupled steps, namely pre-mapping optimizations, technology mapping, and post-mapping optimizations.
- In the pre-mapping optimization phase, technology independent transformations are performed on the AIG to reduce the graph size resulting in a less total area, while adhering to a delay constraint.
- A policy is defined as a mapping M that, for each given state, assigns a probability mass function M(·|a) for an action [13].
- In value-based methods (e.g. Q-learning) a value function is learned by the system that effectively maps pairs to a singular value [14], and picks the maximum over all possible actions.
- Actor Critic algorithms [13], as a hybrid class, combine the benefits of both aforementioned classes

Highlights

- Logic synthesis transforms a high-level description of a design into an optimized gate-level representation
- We introduce DRiLLS (Deep Reinforcemnet Learningbased Logic Synthesis), a novel framework based on reinforcement learning developed for generating logic synthesis optimization flows
- Our work is different from the previous work in that we propose to use a reinforcement learning agent to explore the search space for the purpose of optimizing particular synthesis metrics, and enabling variable length optimization flows, without requiring sample flows for training
- There are two major components in the framework: Logic Synthesis environment, which is a setup of the design space exploration problem as a reinforcement learning task, and Reinforcement Learning environment, which employs an Advantage Actor Critic agent (A2C) to navigate the environment searching for the best optimization at a given state
- The intuition behind modeling this problem into a reinforcement learning context is to provide the machine with a methodology to try and error, similar to how human experts gain their experience optimizing designs
- We have presented a methodology based on reinforcement learning that enables autonomous and efficient exploration of the logic synthesis design space

Methods

- DRiLLS, standing for Deep Reinforcemnet Learning-based Logic Synthesis, effectively maps the design space exploration problem to a game environment.
- There are two major components in the framework: Logic Synthesis environment, which is a setup of the design space exploration problem as a reinforcement learning task, and Reinforcement Learning environment, which employs an Advantage Actor Critic agent (A2C) to navigate the environment searching for the best optimization at a given state.
- The authors discuss both components and the interaction between them in details

Results

- The authors demonstrate the proposed methodology by utilizing the opensource synthesis framework ABC v1.01 [17].
- The authors implement DRiLLS in Python v3.5.2 and utilize TensorFlow r1.12 [18] to train the A2C agent neural networks.
- All experiments are synthesized using ASAP7, a 7 nm standard cell library in typical processing corner.
- The authors evaluate the framework on EPFL arithmetic benchmarks [5], exhibiting wide ranges of circuit characteristics.
- The characteristics of the evaluated benchmarks (e.g. I/Os, number of nodes, edges and levels) can be found in [5].

Conclusion

- The goal of developing DRiLLS is to offer an autonomous framework that is able to explore the optimization space of a given

Greedy Expert-crafted Scripts

EPFL Best Result DRiLLS Exploration Space

DRiLLS Best Result Delay Constraint

135 185 235 285 335 Delay

(a) Max (b) Square-root (c) Log2

(d) Sin (e) Multipler (f) Square circuit design, and produce a high Quality of Result (QoR) with no human in-loop. - The goal of developing DRiLLS is to offer an autonomous framework that is able to explore the optimization space of a given.
- (d) Sin (e) Multipler (f) Square circuit design, and produce a high Quality of Result (QoR) with no human in-loop.
- The intuition behind modeling this problem into a reinforcement learning context is to provide the machine with a methodology to try and error, similar to how human experts gain their experience optimizing designs.
- The authors have presented a methodology based on reinforcement learning that enables autonomous and efficient exploration of the logic synthesis design space.
- It allows the agent to find a minimum design area subject to delay constraint.
- Evaluating ten representative benchmarks, the proposed methodology manifests results that outperform existing methods

Summary

## Introduction

Logic synthesis transforms a high-level description of a design into an optimized gate-level representation.- Modern logic synthesis tools represent a given design as an And-Inverter Graph (AIG), which encodes representative characteristics for optimizing Boolean functions.
- Logic Synthesis mainly consists of three tightly-coupled steps, namely pre-mapping optimizations, technology mapping, and post-mapping optimizations.
- In the pre-mapping optimization phase, technology independent transformations are performed on the AIG to reduce the graph size resulting in a less total area, while adhering to a delay constraint.
- A policy is defined as a mapping M that, for each given state, assigns a probability mass function M(·|a) for an action [13].
- In value-based methods (e.g. Q-learning) a value function is learned by the system that effectively maps pairs to a singular value [14], and picks the maximum over all possible actions.
- Actor Critic algorithms [13], as a hybrid class, combine the benefits of both aforementioned classes
## Methods

DRiLLS, standing for Deep Reinforcemnet Learning-based Logic Synthesis, effectively maps the design space exploration problem to a game environment.- There are two major components in the framework: Logic Synthesis environment, which is a setup of the design space exploration problem as a reinforcement learning task, and Reinforcement Learning environment, which employs an Advantage Actor Critic agent (A2C) to navigate the environment searching for the best optimization at a given state.
- The authors discuss both components and the interaction between them in details
## Results

The authors demonstrate the proposed methodology by utilizing the opensource synthesis framework ABC v1.01 [17].- The authors implement DRiLLS in Python v3.5.2 and utilize TensorFlow r1.12 [18] to train the A2C agent neural networks.
- All experiments are synthesized using ASAP7, a 7 nm standard cell library in typical processing corner.
- The authors evaluate the framework on EPFL arithmetic benchmarks [5], exhibiting wide ranges of circuit characteristics.
- The characteristics of the evaluated benchmarks (e.g. I/Os, number of nodes, edges and levels) can be found in [5].
## Conclusion

The goal of developing DRiLLS is to offer an autonomous framework that is able to explore the optimization space of a given

Greedy Expert-crafted Scripts

EPFL Best Result DRiLLS Exploration Space

DRiLLS Best Result Delay Constraint

135 185 235 285 335 Delay

(a) Max (b) Square-root (c) Log2

(d) Sin (e) Multipler (f) Square circuit design, and produce a high Quality of Result (QoR) with no human in-loop.- The goal of developing DRiLLS is to offer an autonomous framework that is able to explore the optimization space of a given.
- (d) Sin (e) Multipler (f) Square circuit design, and produce a high Quality of Result (QoR) with no human in-loop.
- The intuition behind modeling this problem into a reinforcement learning context is to provide the machine with a methodology to try and error, similar to how human experts gain their experience optimizing designs.
- The authors have presented a methodology based on reinforcement learning that enables autonomous and efficient exploration of the logic synthesis design space.
- It allows the agent to find a minimum design area subject to delay constraint.
- Evaluating ten representative benchmarks, the proposed methodology manifests results that outperform existing methods

- Table1: Formulation of the multi-objective reward function. Decr. stands for Decrease and Incr. stands for Increase
- Table2: Area-delay comparison of logic synthesis optimization results. A greedy algorithm optimizes for area. Expert-crafted scripts are derived from [<a class="ref-link" id="c6" href="#r6">6</a>]. EPFL best results for size are available at [<a class="ref-link" id="c5" href="#r5">5</a>]

Funding

- This work is supported by DARPA (HR0011-18-2-0032)

Reference

- C. Yu, H. Xiao, and G. De Micheli, “Developing synthesis flows without human knowledge,” in Design Automation Conference, ser. DAC ’18. ACM, 2018, pp. 50:1–50:6.
- M. M. Ziegler, H.-Y. Liu et al., “A synthesis-parameter tuning system for autonomous design-space exploration,” in DATE, 2016, pp. 1148–1151.
- D. Silver, J. Schrittwieser et al., “Mastering the game of go without human knowledge,” Nature, vol. 550, no. 7676, p. 354, 2017.
- M. Jaderberg, W. M. Czarnecki, Dunning et al., “Human-level performance in first-person multiplayer games with population-based deep reinforcement learning,” arXiv preprint arXiv:1807.01281, 2018.
- L. Amaru, P.-E. Gaillardon, and G. De Micheli, “The epfl combinational benchmark suite,” in IWLS, no. CONF, 2015.
- W. Yang, L. Wang, and A. Mishchenko, “Lazy man’s logic synthesis,” in ICCAD. IEEE, 2012, pp. 597–604.
- E. ̈Ipek, S. A. McKee, R. Caruana, B. R. de Supinski, and M. Schulz, “Efficiently exploring architectural design spaces via predictive modeling,” SIGPLAN Not., vol. 41, no. 11, pp. 195–206, Oct. 2006.
- B. Ozisikyilmaz, G. Memik, and A. Choudhary, “Efficient system design space exploration using machine learning techniques,” in 45th ACM/IEEE Design Automation Conference, June 2008, pp. 966–969.
- H.-Y. Liu and L. P. Carloni, “On learning-based methods for designspace exploration with high-level synthesis,” in Design Automation Conference, May 2013, pp. 1–7.
- M. M. Ziegler, H.-Y. Liu, and L. P. Carloni, “Scalable auto-tuning of synthesis parameters for optimizing high-performance processors,” in ACM International Symposium on Low Power Electronics and Design, 2016, pp. 180–185.
- V. Mnih, K. Kavukcuoglu et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, p. 529, 2015.
- T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, 2015.
- V. R. Konda and J. N. Tsitsiklis, “Actor-critic algorithms,” in Advances in neural information processing systems, 2000, pp. 1008–1014.
- C. J. Watkins and P. Dayan, “Q-learning,” Machine learning, vol. 8, no. 3-4, pp. 279–292, 1992.
- R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, “Policy gradient methods for reinforcement learning with function approximation,” in Advances in neural information processing systems, 2000, pp. 1057–1063.
- V. R. Konda and J. N. Tsitsiklis, “Onactor-critic algorithms,” SIAM journal on Control and Optimization, vol. 42, no. 4, pp. 1143–1166, 2003.
- A. Mishchenko et al., “Abc: A system for sequential synthesis and verification,” URL http://www.eecs.berkeley.edu/alanmi/abc, pp.1– 17, 2007.
- M. Abadi et al., “TensorFlow: Large-scale machine learning on heterogeneous systems,” 2015. [Online]. Available: https://www.tensorflow.org/
- X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proceedings of the thirteenth international conference on artificial intelligence and statistics, 2010, pp. 249–256.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.

Tags

Comments