Mixed Reinforcement Learning for Efficient Policy Optimization in Stochastic Environments

2020 20th International Conference on Control, Automation and Systems (ICCAS)(2020)

引用 6|浏览6
暂无评分
摘要
Reinforcement learning has the potential to control stochastic nonlinear systems in optimal manners successfully. We propose a mixed reinforcement learning (mixed RL) algorithm by simultaneously using dual representations of environmental dynamics to search the optimal policy. The dual representation includes an empirical dynamic model and a set of state-action data. The former can embed the designer's knowledge and reduce the difficulty of learning, and the latter can be used to compensate the model inaccuracy since it reflects the real system dynamics accurately. Such a design has the capability of improving both learning accuracy and training speed. In the mixed RL framework, the additive uncertainty of stochastic model is compensated by using explored state-action data via iterative Bayesian estimator (IBE). The optimal policy is then computed in an iterative way by alternating between policy evaluation (PEV) and policy improvement (PIM). The effectiveness of mixed RL is demonstrated by a typical optimal control problem of stochastic non-affine nonlinear systems (i.e., double lane change task with an automated vehicle).
更多
查看译文
关键词
Reinforcement learning,Bayesian estimation,Dynamic model,State-action data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要