Online Adaptive Optimal Control of Discrete-time Linear Systems via Synchronous Q-learning.


Cited 0|Views6
No score
In this paper, a novel synchronous Q-learning method is proposed for solving discrete-time linear quadratic regulator (LQR) problems. To begin with, the Bellman equation corresponding to the optimal Q-function is reformulated into a consistency equation on the parameters of the optimal Q-function and the optimal controller. Then an actor-critic structure is introduced to learn the optimal Q-function and the optimal controller online in real time by using the state samples generated by the behavior policy. Particularly, the proposed synchronous Q-learning scheme simultaneously updates the Q-function approximation and the optimal controller approximation, rather than iterating between policy evaluation and policy improvement. The proposed control scheme is proved to be uniformly ultimately bounded (UUB) under appropriate learning rates, provided that certain persistence of excitation (PE) conditions are satisfied. Besides, the PE conditions can be easily met by injecting appropriate exploration noise into the behavior policy without causing any excitation noise bias. Finally, one simulation example is provided to verify the effectiveness of the proposed synchronous Q-learning method.
Translated text
Key words
AI Read Science
Must-Reading Tree
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined