KoGuN: Accelerating Deep Reinforcement Learning via Integrating Human Suboptimal Knowledge

IJCAI, pp. 2291-2297, 2020.

Cited by: 0|Bibtex|Views132|Links
EI
Keywords:
new taskempirical resultlearning processfuzzy rulehuman suboptimal knowledgeMore(18+)
Weibo:
We propose a novel policy network framework called knowledge guided policy network to leverage human knowledge to accelerate the learning process of RL agents

Abstract:

Reinforcement learning agents usually learn from scratch, which requires a large number of interactions with the environment. This is quite different from the learning process of human. When faced with a new task, human naturally have the common sense and use the prior knowledge to derive an initial policy and guide the learning process...More

Code:

Data:

0
Introduction
  • Deep reinforcement learning (DRL) algorithms have been applied in a range of challenging domains, from playing games [Mnih et al, 2015; Silver et al, 2016] to robotic control [Schulman et al, 2015a].
  • Human naturally leverage knowledge obtained in similar tasks to derive an rough policy and guide the learning process afterwards.
  • These knowledge may be not completely compatible with the new task, human can adjust the policy in the following learning process.
  • Integrating human knowledge into reinforcement learning algorithms is promising to boost the learning process
Highlights
  • Deep reinforcement learning (DRL) algorithms have been applied in a range of challenging domains, from playing games [Mnih et al, 2015; Silver et al, 2016] to robotic control [Schulman et al, 2015a]
  • The combination of RL and high-capacity function approximators such as neural networks holds the promise of automating a wide range of decision making and control tasks, but application of these methods in real-world domains has been hampered by the challenge of sample complexity
  • We propose a novel knowledge guided policy network (KoGuN), which can integrate human knowledge into RL algorithms in an end-to-end manner
  • We evaluate our method on discrete and continuous control tasks and the experimental results show that our approach achieves significant improvement on learning efficiency of RL algorithms
  • We propose a novel policy network framework called knowledge guided policy network to leverage human knowledge to accelerate the learning process of RL agents
  • We evaluate our method on both discrete and continuous tasks and the experimental results show our method can significantly improve the learning efficiency of RL agents even with very low-performance human prior knowledge
Methods
  • The authors firstly evaluate the algorithm on four tasks in Section 4.1 : CartP ole [Barto and Sutton, 1982], LunarLander and LunarLanderContinuous in OpenAI Gym [Brockman et al, 2016] and F lappyBird in PLE [Tasfi, 2016].
  • The authors show the effectiveness and robustness of KoGuN in sparse reward setting in Section 4.2.
  • For PPO without KoGuN, the authors use a neural network with two full-connected hidden layers as policy approximator.
  • For KoGuN with normal network (KoGuN-concat) as refine module, the authors use a neural network with two full-connected hidden layers for the refine module.
  • For KoGuN with hypernetworks (KoGuN-hyper), the authors use hypernetworks to generate a refine module with one hidden layer.
  • All hidden layers described above have 32 units. w1 is set to 0.7 at beginning and decays to 0.1 in the end of training phase
Results
  • The authors designed some rules for each task.
  • Here the authors only give the rules used in CartP ole as an example1.
  • In CartP ole, a pole is attached by an un-actuated joint to a cart.
  • The episode ends when the pole is more than 15 degrees from vertical, or the cart moves more than 2.4 units away from the center.
  • 6 rules are used in the experiment and they are listed below:
Conclusion
  • The authors propose a novel policy network framework called KoGuN to leverage human knowledge to accelerate the learning process of RL agents.
  • The proposed framework consists of a knowledge controller and a refine module.
  • The knowledge controller represents human suboptimal knowledge using fuzzy rules and the refine module refines the imprecise prior knowledge.
  • The authors evaluate the method on both discrete and continuous tasks and the experimental results show the method can significantly improve the learning efficiency of RL agents even with very low-performance human prior knowledge.
  • The authors would like to investigate the knowledge representation method of more challenging tasks, such as tasks with highdimensional visual data as state space
Summary
  • Introduction:

    Deep reinforcement learning (DRL) algorithms have been applied in a range of challenging domains, from playing games [Mnih et al, 2015; Silver et al, 2016] to robotic control [Schulman et al, 2015a].
  • Human naturally leverage knowledge obtained in similar tasks to derive an rough policy and guide the learning process afterwards.
  • These knowledge may be not completely compatible with the new task, human can adjust the policy in the following learning process.
  • Integrating human knowledge into reinforcement learning algorithms is promising to boost the learning process
  • Methods:

    The authors firstly evaluate the algorithm on four tasks in Section 4.1 : CartP ole [Barto and Sutton, 1982], LunarLander and LunarLanderContinuous in OpenAI Gym [Brockman et al, 2016] and F lappyBird in PLE [Tasfi, 2016].
  • The authors show the effectiveness and robustness of KoGuN in sparse reward setting in Section 4.2.
  • For PPO without KoGuN, the authors use a neural network with two full-connected hidden layers as policy approximator.
  • For KoGuN with normal network (KoGuN-concat) as refine module, the authors use a neural network with two full-connected hidden layers for the refine module.
  • For KoGuN with hypernetworks (KoGuN-hyper), the authors use hypernetworks to generate a refine module with one hidden layer.
  • All hidden layers described above have 32 units. w1 is set to 0.7 at beginning and decays to 0.1 in the end of training phase
  • Results:

    The authors designed some rules for each task.
  • Here the authors only give the rules used in CartP ole as an example1.
  • In CartP ole, a pole is attached by an un-actuated joint to a cart.
  • The episode ends when the pole is more than 15 degrees from vertical, or the cart moves more than 2.4 units away from the center.
  • 6 rules are used in the experiment and they are listed below:
  • Conclusion:

    The authors propose a novel policy network framework called KoGuN to leverage human knowledge to accelerate the learning process of RL agents.
  • The proposed framework consists of a knowledge controller and a refine module.
  • The knowledge controller represents human suboptimal knowledge using fuzzy rules and the refine module refines the imprecise prior knowledge.
  • The authors evaluate the method on both discrete and continuous tasks and the experimental results show the method can significantly improve the learning efficiency of RL agents even with very low-performance human prior knowledge.
  • The authors would like to investigate the knowledge representation method of more challenging tasks, such as tasks with highdimensional visual data as state space
Reference
  • [Barto and Sutton, 1982] Andrew G Barto and Richard S Sutton. Simulation of anticipatory responses in classical conditioning by a neuron-like adaptive element. Behavioural Brain Research, 4(3):221–235, 1982.
    Google ScholarLocate open access versionFindings
  • [Berenji, 1992] Hamid R Berenji. A reinforcement learningbased architecture for fuzzy logic control. International Journal of Approximate Reasoning, 6(2):267–292, 1992.
    Google ScholarLocate open access versionFindings
  • [Brockman et al., 2016] Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym, 2016.
    Google ScholarLocate open access versionFindings
  • [Collobert et al., 2011] Ronan Collobert, Jason Weston, Leon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. Natural language processing (almost) from scratch. Journal of machine learning research, 12(Aug):2493–2537, 2011.
    Google ScholarLocate open access versionFindings
  • [Fischer et al., 2019] Marc Fischer, Mislav Balunovic, Dana Drachsler-Cohen, Timon Gehr, Ce Zhang, and Martin
    Google ScholarFindings
  • Vechev. Dl2: Training and querying neural networks with logic. In International Conference on Machine Learning, pages 1931–1941, 2019.
    Google ScholarLocate open access versionFindings
  • [Garcez et al., 2012] Artur S d’Avila Garcez, Krysia B Broda, and Dov M Gabbay. Neural-symbolic learning systems: foundations and applications. Springer Science & Business Media, 2012.
    Google ScholarFindings
  • [Gordon et al., 2011] Geoffrey J. Gordon, David B. Dunson, and Miroslav Dudık, editors. Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2011, Fort Lauderdale, USA, April 11-13, 2011, volume 15 of JMLR Proceedings. JMLR.org, 2011.
    Google ScholarLocate open access versionFindings
  • [Ha et al., 2016] David Ha, Andrew Dai, and Quoc V Le. Hypernetworks. arXiv preprint arXiv:1609.09106, 2016.
    Findings
  • [Hester et al., 2017] Todd Hester, Matej Vecerık, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Andrew Sendonaris, Gabriel Dulac-Arnold, Ian Osband, John Agapiou, Joel Z. Leibo, and Audrunas Gruslys. Learning from demonstrations for real world reinforcement learning. CoRR, abs/1704.03732, 2017.
    Findings
  • [Ho and Ermon, 2016] Jonathan Ho and Stefano Ermon. Generative adversarial imitation learning. In Advances in neural information processing systems, pages 4565–4573, 2016.
    Google ScholarLocate open access versionFindings
  • [Hu et al., 2016] Zhiting Hu, Xuezhe Ma, Zhengzhong Liu, Eduard Hovy, and Eric Xing. Harnessing deep neural networks with logic rules. arXiv preprint arXiv:1603.06318, 2016.
    Findings
  • [Kingma and Ba, 2015] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
    Google ScholarLocate open access versionFindings
  • [Mnih et al., 2015] V Mnih, K Kavukcuoglu, D Silver, A. A. Rusu, J Veness, M. G. Bellemare, A Graves, M Riedmiller, A. K. Fidjeland, and G Ostrovski. Human-level control through deep reinforcement learning. Nature, 518(7540):529, 2015.
    Google ScholarLocate open access versionFindings
  • [Pomerleau, 1991] Dean Pomerleau. Efficient training of artificial neural networks for autonomous navigation. Neural Computation, 3(1):88–97, 1991.
    Google ScholarLocate open access versionFindings
  • [Schmidhuber, 1992] Jurgen Schmidhuber. Learning to control fast-weight memories: An alternative to dynamic recurrent networks. Neural Computation, 4(1):131–139, 1992.
    Google ScholarLocate open access versionFindings
  • [Schulman et al., 2015a] John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. Trust region policy optimization. In International conference on machine learning, pages 1889–1897, 2015.
    Google ScholarLocate open access versionFindings
  • [Schulman et al., 2015b] John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. Highdimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438, 2015.
    Findings
  • [Schulman et al., 2017] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
    Findings
  • [Silver et al., 2016] D Silver, A. Huang, C. J. Maddison, A Guez, L Sifre, den Driessche G Van, J Schrittwieser, I Antonoglou, V Panneershelvam, and M Lanctot. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484–489, 2016.
    Google ScholarLocate open access versionFindings
  • [Sutton and Barto, 2018] Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018.
    Google ScholarFindings
  • [Tasfi, 2016] Norman Tasfi.
    Google ScholarFindings
  • https://github.com/ntasfi/
    Findings
  • PyGame-Learning-Environment, 2016.
    Google ScholarFindings
  • [Wu et al., 2017] Yuhuai Wu, Elman Mansimov, Roger B Grosse, Shun Liao, and Jimmy Ba. Scalable trust-region method for deep reinforcement learning using kroneckerfactored approximation. In Advances in neural information processing systems, pages 5279–5288, 2017.
    Google ScholarLocate open access versionFindings
  • [Yager and Zadeh, 2012] Ronald R Yager and Lotfi A Zadeh. An introduction to fuzzy logic applications in intelligent systems, volume 165. Springer Science & Business Media, 2012.
    Google ScholarFindings
  • [Zadeh, 1965] L. A. Zadeh. Fuzzy sets. Information and Control, 8(3):338–353, 1965.
    Google ScholarLocate open access versionFindings
  • [Zadeh, 1996] Lotfi A Zadeh. Knowledge representation in fuzzy logic. In Fuzzy Sets, Fuzzy Logic, And Fuzzy Systems: Selected Papers by Lotfi A Zadeh, pages 764–774. World Scientific, 1996.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments