LiFE: Deep Exploration via Linear-Feature Bonus in Continuous Control

Tsinghua Science and Technology(2023)

引用 1|浏览22
暂无评分
摘要
Reinforcement Learning (RL) algorithms work well with well-defined rewards, but they fail with sparse/deceptive rewards and require additional exploration strategies. This work introduces a deep exploration method based on the Upper Confidence Bound (UCB) bonus. The proposed method can be plugged into actor-critic algorithms that use deep neural networks as a critic. Based on the conclusion of the regret bound under the linear Markov decision process approximation, we use the feature matrix to calculate the UCB bonus for deep exploration. The proposed method is equivalent to the count-based exploration method in special cases and is suitable for general situations. Our method uses the last d-dimensional feature vector in the critic network and is easy to deploy. We design a simple task, “swim”, to demonstrate the principle of the proposed method to achieve exploration in sparse/deceptive reward environments. Then, we perform an empirical evaluation on sparse/deceptive reward version gym environments and Ackermann robot control tasks. The evaluation results verify that the proposed algorithm can perform effective deep explorations in sparse/deceptive reward tasks.
更多
查看译文
关键词
Reinforcement Learning (RL),Neural Network (NN),Upper Confidence Bound (UCB)
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要