SADRL: Merging human experience with machine intelligence via supervised assisted deep reinforcement learning

Xiaoshuang Li, Xiao Wang,Xinhu Zheng, Junchen Jin,Yanhao Huang, Jun Jason Zhang,Fei-Yue Wang

Neurocomputing(2022)

引用 7|浏览138
暂无评分
摘要
Deep Reinforcement Learning (DRL) has proven its capability to learn optimal policies in decision-making problems by directly interacting with environments. Meanwhile, supervised learning methods also show great capability of learning from data. However, how to combine DRL with supervised learning and leverage additional knowledge and data to assist the DRL agent remains difficult. This study proposes a novel Supervised Assisted Deep Reinforcement Learning (SADRL) framework integrating deep Q-learning from dynamic demonstrations with a behavioral cloning model (DQfDD-BC). Specifically, the proposed DQfDD-BC method leverages historical demonstrations to pre-train a behavioral cloning model and consistently update it by learning the dynamically updated demonstrations. A supervised expert loss function is designed to compare actions generated by the DRL model with those obtained from the BC model to provide advantageous guidance for policy improvements. Experimental results in several OpenAI Gym environments show that the proposed approach accelerates the learning processes, and meanwhile, adapts to different performance levels of demonstrations. As illustrated in an ablation study, the dynamic demonstration and expert loss mechanisms using a BC model contribute to improving the learning convergence performance compared with the baseline models. We believe that SADRL provides an elegant framework and the proposed method can promote the integration of human experience and machine intelligence.
更多
查看译文
关键词
Deep reinforcement learning,Behavioral cloning,Dynamic demonstration,Double DQN
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要