Data-based Optimal Control for Discrete-time Systems via Deep Deterministic Policy Gradient Adaptive Dynamic Programming

2019 9th International Conference on Information Science and Technology (ICIST)(2019)

引用 0|浏览13
暂无评分
摘要
The model-free optimal control problem for discrete-time systems is considered in this paper by using deep deterministic policy gradient adaptive dynamic programming (DDPGADP) algorithm. The system data is obtained by using the off-policy learning and the control law is updated by policy gradient. The convergence of DDPGADP algorithm is verified by showing that the Q-function sequence is monotonically non-increasing and converges to the optimum. In order to implement this method, an actor-critic neural network structure is established by adopting the target network technology from deep Q-learning during the neural network training process. Finally, simulation examples are presented to verify the effectiveness of the proposed method.
更多
查看译文
关键词
Adaptive dynamic programming,Deep deterministic policy gradient,Optimal control,Neural networks,Model-free
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要