Data-based Optimal Control for Discrete-time Systems via Deep Deterministic Policy Gradient Adaptive Dynamic Programming
2019 9th International Conference on Information Science and Technology (ICIST)(2019)
摘要
The model-free optimal control problem for discrete-time systems is considered in this paper by using deep deterministic policy gradient adaptive dynamic programming (DDPGADP) algorithm. The system data is obtained by using the off-policy learning and the control law is updated by policy gradient. The convergence of DDPGADP algorithm is verified by showing that the Q-function sequence is monotonically non-increasing and converges to the optimum. In order to implement this method, an actor-critic neural network structure is established by adopting the target network technology from deep Q-learning during the neural network training process. Finally, simulation examples are presented to verify the effectiveness of the proposed method.
更多查看译文
关键词
Adaptive dynamic programming,Deep deterministic policy gradient,Optimal control,Neural networks,Model-free
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要