DRiLLS - Deep Reinforcement Learning for Logic Synthesis

Abdelrahman Hosny
Abdelrahman Hosny
Soheil Hashemi
Soheil Hashemi

ASP-DAC, pp. 581-586, 2020.

Cited by: 0|Views14
EI
Weibo:
We have presented a methodology based on reinforcement learning that enables autonomous and efficient exploration of the logic synthesis design space

Abstract:

Logic synthesis requires extensive tuning of the synthesis optimization flow where the quality of results (QoR) depends on the sequence of optimizations used. Efficient design space exploration is challenging due to the exponential number of possible optimization permutations. Therefore, automating the optimization process is necessary. I...More

Code:

Data:

0
Introduction
  • Logic synthesis transforms a high-level description of a design into an optimized gate-level representation.
  • Modern logic synthesis tools represent a given design as an And-Inverter Graph (AIG), which encodes representative characteristics for optimizing Boolean functions.
  • Logic Synthesis mainly consists of three tightly-coupled steps, namely pre-mapping optimizations, technology mapping, and post-mapping optimizations.
  • In the pre-mapping optimization phase, technology independent transformations are performed on the AIG to reduce the graph size resulting in a less total area, while adhering to a delay constraint.
  • A policy is defined as a mapping M that, for each given state, assigns a probability mass function M(·|a) for an action [13].
  • In value-based methods (e.g. Q-learning) a value function is learned by the system that effectively maps pairs to a singular value [14], and picks the maximum over all possible actions.
  • Actor Critic algorithms [13], as a hybrid class, combine the benefits of both aforementioned classes
Highlights
  • Logic synthesis transforms a high-level description of a design into an optimized gate-level representation
  • We introduce DRiLLS (Deep Reinforcemnet Learningbased Logic Synthesis), a novel framework based on reinforcement learning developed for generating logic synthesis optimization flows
  • Our work is different from the previous work in that we propose to use a reinforcement learning agent to explore the search space for the purpose of optimizing particular synthesis metrics, and enabling variable length optimization flows, without requiring sample flows for training
  • There are two major components in the framework: Logic Synthesis environment, which is a setup of the design space exploration problem as a reinforcement learning task, and Reinforcement Learning environment, which employs an Advantage Actor Critic agent (A2C) to navigate the environment searching for the best optimization at a given state
  • The intuition behind modeling this problem into a reinforcement learning context is to provide the machine with a methodology to try and error, similar to how human experts gain their experience optimizing designs
  • We have presented a methodology based on reinforcement learning that enables autonomous and efficient exploration of the logic synthesis design space
Methods
  • DRiLLS, standing for Deep Reinforcemnet Learning-based Logic Synthesis, effectively maps the design space exploration problem to a game environment.
  • There are two major components in the framework: Logic Synthesis environment, which is a setup of the design space exploration problem as a reinforcement learning task, and Reinforcement Learning environment, which employs an Advantage Actor Critic agent (A2C) to navigate the environment searching for the best optimization at a given state.
  • The authors discuss both components and the interaction between them in details
Results
  • The authors demonstrate the proposed methodology by utilizing the opensource synthesis framework ABC v1.01 [17].
  • The authors implement DRiLLS in Python v3.5.2 and utilize TensorFlow r1.12 [18] to train the A2C agent neural networks.
  • All experiments are synthesized using ASAP7, a 7 nm standard cell library in typical processing corner.
  • The authors evaluate the framework on EPFL arithmetic benchmarks [5], exhibiting wide ranges of circuit characteristics.
  • The characteristics of the evaluated benchmarks (e.g. I/Os, number of nodes, edges and levels) can be found in [5].
Conclusion
  • The goal of developing DRiLLS is to offer an autonomous framework that is able to explore the optimization space of a given

    Greedy Expert-crafted Scripts

    EPFL Best Result DRiLLS Exploration Space

    DRiLLS Best Result Delay Constraint

    135 185 235 285 335 Delay

    (a) Max (b) Square-root (c) Log2

    (d) Sin (e) Multipler (f) Square circuit design, and produce a high Quality of Result (QoR) with no human in-loop.
  • The goal of developing DRiLLS is to offer an autonomous framework that is able to explore the optimization space of a given.
  • (d) Sin (e) Multipler (f) Square circuit design, and produce a high Quality of Result (QoR) with no human in-loop.
  • The intuition behind modeling this problem into a reinforcement learning context is to provide the machine with a methodology to try and error, similar to how human experts gain their experience optimizing designs.
  • The authors have presented a methodology based on reinforcement learning that enables autonomous and efficient exploration of the logic synthesis design space.
  • It allows the agent to find a minimum design area subject to delay constraint.
  • Evaluating ten representative benchmarks, the proposed methodology manifests results that outperform existing methods
Summary
  • Introduction

    Logic synthesis transforms a high-level description of a design into an optimized gate-level representation.
  • Modern logic synthesis tools represent a given design as an And-Inverter Graph (AIG), which encodes representative characteristics for optimizing Boolean functions.
  • Logic Synthesis mainly consists of three tightly-coupled steps, namely pre-mapping optimizations, technology mapping, and post-mapping optimizations.
  • In the pre-mapping optimization phase, technology independent transformations are performed on the AIG to reduce the graph size resulting in a less total area, while adhering to a delay constraint.
  • A policy is defined as a mapping M that, for each given state, assigns a probability mass function M(·|a) for an action [13].
  • In value-based methods (e.g. Q-learning) a value function is learned by the system that effectively maps pairs to a singular value [14], and picks the maximum over all possible actions.
  • Actor Critic algorithms [13], as a hybrid class, combine the benefits of both aforementioned classes
  • Methods

    DRiLLS, standing for Deep Reinforcemnet Learning-based Logic Synthesis, effectively maps the design space exploration problem to a game environment.
  • There are two major components in the framework: Logic Synthesis environment, which is a setup of the design space exploration problem as a reinforcement learning task, and Reinforcement Learning environment, which employs an Advantage Actor Critic agent (A2C) to navigate the environment searching for the best optimization at a given state.
  • The authors discuss both components and the interaction between them in details
  • Results

    The authors demonstrate the proposed methodology by utilizing the opensource synthesis framework ABC v1.01 [17].
  • The authors implement DRiLLS in Python v3.5.2 and utilize TensorFlow r1.12 [18] to train the A2C agent neural networks.
  • All experiments are synthesized using ASAP7, a 7 nm standard cell library in typical processing corner.
  • The authors evaluate the framework on EPFL arithmetic benchmarks [5], exhibiting wide ranges of circuit characteristics.
  • The characteristics of the evaluated benchmarks (e.g. I/Os, number of nodes, edges and levels) can be found in [5].
  • Conclusion

    The goal of developing DRiLLS is to offer an autonomous framework that is able to explore the optimization space of a given

    Greedy Expert-crafted Scripts

    EPFL Best Result DRiLLS Exploration Space

    DRiLLS Best Result Delay Constraint

    135 185 235 285 335 Delay

    (a) Max (b) Square-root (c) Log2

    (d) Sin (e) Multipler (f) Square circuit design, and produce a high Quality of Result (QoR) with no human in-loop.
  • The goal of developing DRiLLS is to offer an autonomous framework that is able to explore the optimization space of a given.
  • (d) Sin (e) Multipler (f) Square circuit design, and produce a high Quality of Result (QoR) with no human in-loop.
  • The intuition behind modeling this problem into a reinforcement learning context is to provide the machine with a methodology to try and error, similar to how human experts gain their experience optimizing designs.
  • The authors have presented a methodology based on reinforcement learning that enables autonomous and efficient exploration of the logic synthesis design space.
  • It allows the agent to find a minimum design area subject to delay constraint.
  • Evaluating ten representative benchmarks, the proposed methodology manifests results that outperform existing methods
Tables
  • Table1: Formulation of the multi-objective reward function. Decr. stands for Decrease and Incr. stands for Increase
  • Table2: Area-delay comparison of logic synthesis optimization results. A greedy algorithm optimizes for area. Expert-crafted scripts are derived from [<a class="ref-link" id="c6" href="#r6">6</a>]. EPFL best results for size are available at [<a class="ref-link" id="c5" href="#r5">5</a>]
Download tables as Excel
Funding
  • This work is supported by DARPA (HR0011-18-2-0032)
Reference
  • C. Yu, H. Xiao, and G. De Micheli, “Developing synthesis flows without human knowledge,” in Design Automation Conference, ser. DAC ’18. ACM, 2018, pp. 50:1–50:6.
    Google ScholarFindings
  • M. M. Ziegler, H.-Y. Liu et al., “A synthesis-parameter tuning system for autonomous design-space exploration,” in DATE, 2016, pp. 1148–1151.
    Google ScholarLocate open access versionFindings
  • D. Silver, J. Schrittwieser et al., “Mastering the game of go without human knowledge,” Nature, vol. 550, no. 7676, p. 354, 2017.
    Google ScholarLocate open access versionFindings
  • M. Jaderberg, W. M. Czarnecki, Dunning et al., “Human-level performance in first-person multiplayer games with population-based deep reinforcement learning,” arXiv preprint arXiv:1807.01281, 2018.
    Findings
  • L. Amaru, P.-E. Gaillardon, and G. De Micheli, “The epfl combinational benchmark suite,” in IWLS, no. CONF, 2015.
    Google ScholarLocate open access versionFindings
  • W. Yang, L. Wang, and A. Mishchenko, “Lazy man’s logic synthesis,” in ICCAD. IEEE, 2012, pp. 597–604.
    Google ScholarLocate open access versionFindings
  • E. ̈Ipek, S. A. McKee, R. Caruana, B. R. de Supinski, and M. Schulz, “Efficiently exploring architectural design spaces via predictive modeling,” SIGPLAN Not., vol. 41, no. 11, pp. 195–206, Oct. 2006.
    Google ScholarLocate open access versionFindings
  • B. Ozisikyilmaz, G. Memik, and A. Choudhary, “Efficient system design space exploration using machine learning techniques,” in 45th ACM/IEEE Design Automation Conference, June 2008, pp. 966–969.
    Google ScholarLocate open access versionFindings
  • H.-Y. Liu and L. P. Carloni, “On learning-based methods for designspace exploration with high-level synthesis,” in Design Automation Conference, May 2013, pp. 1–7.
    Google ScholarFindings
  • M. M. Ziegler, H.-Y. Liu, and L. P. Carloni, “Scalable auto-tuning of synthesis parameters for optimizing high-performance processors,” in ACM International Symposium on Low Power Electronics and Design, 2016, pp. 180–185.
    Google ScholarLocate open access versionFindings
  • V. Mnih, K. Kavukcuoglu et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, p. 529, 2015.
    Google ScholarLocate open access versionFindings
  • T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, 2015.
    Findings
  • V. R. Konda and J. N. Tsitsiklis, “Actor-critic algorithms,” in Advances in neural information processing systems, 2000, pp. 1008–1014.
    Google ScholarFindings
  • C. J. Watkins and P. Dayan, “Q-learning,” Machine learning, vol. 8, no. 3-4, pp. 279–292, 1992.
    Google ScholarLocate open access versionFindings
  • R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, “Policy gradient methods for reinforcement learning with function approximation,” in Advances in neural information processing systems, 2000, pp. 1057–1063.
    Google ScholarFindings
  • V. R. Konda and J. N. Tsitsiklis, “Onactor-critic algorithms,” SIAM journal on Control and Optimization, vol. 42, no. 4, pp. 1143–1166, 2003.
    Google ScholarLocate open access versionFindings
  • A. Mishchenko et al., “Abc: A system for sequential synthesis and verification,” URL http://www.eecs.berkeley.edu/alanmi/abc, pp.1– 17, 2007.
    Findings
  • M. Abadi et al., “TensorFlow: Large-scale machine learning on heterogeneous systems,” 2015. [Online]. Available: https://www.tensorflow.org/
    Findings
  • X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proceedings of the thirteenth international conference on artificial intelligence and statistics, 2010, pp. 249–256.
    Google ScholarLocate open access versionFindings
  • D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
    Findings
Your rating :
0

 

Tags
Comments