Enhancing Twin Delayed Deep Deterministic Policy Gradient with Cross-Entropy Method

2021 8th NAFOSTED Conference on Information and Computer Science (NICS)(2021)

引用 2|浏览11
暂无评分
摘要
Hybridizations of Deep Reinforcement Learning (DRL) and Evolution Computation (EC) methods have recently showed considerable successes in a variety of high dimensional physical control tasks. These hybrid frameworks offer more robust mechanisms of exploration and exploitation in the policy network parameter search space when stabilizing gradient-based updates of DRL algorithms with population-based operations adopted from EC methods. In this paper, we propose a novel hybrid framework that effectively combines the efficiency of DRL updates and the stability of EC populations. We experiment with integrating the Twin Delayed Deep Deterministic Policy Gradient (TD3) and the Cross-Entropy Method (CEM). The resulting EC-enhanced TD3 algorithm (eTD3) are compared with the baseline algorithm TD3 and a state-of-the-art evolutionary reinforcement learning (ERL) method, CEM-TD3. Experimental results on five MuJoCo continuous control benchmark environments confirm the efficacy of our approach. The source code of the paper is available at https://github.com/ELO-Lab/eTD3.
更多
查看译文
关键词
deep reinforcement learning,policy search,evolutionary computation,cross-entropy method
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要