Online Policy Iteration Algorithms for Linear Continuous-Time H-Infinity Regulation With Completely Unknown Dynamics

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING(2023)

引用 0|浏览20
暂无评分
摘要
This paper proposes two online policy-iteration (PI) algorithms for solving linear continuous-time H-infinity regulation problems with unknown dynamics. Our results are completely learning-orientated in the sense that prior model knowledge of initial stabilizing control policies arising from solving the Game Algebraic Riccati Equation (GARE) associated with the H-infinity regulation problem is now removed, which thereby resolves a long-standing challenge in the existing PI works to achieve model-free learning. To this end, two offline PI algorithms, consisting of the single-looped and the double-looped, are first proposed by nesting a homotopy-based initialization to solve a series of Lyapunov equations associated with the GARE. Then, two online PI algorithms are further proposed by utilizing the system data to avoid the model requirement for online solving the GARE. The single-looped PI algorithm has the feature of simultaneously learning control and disturbance policies, while a double-looped PI updates control policies before carrying out a series of learning disturbance policies. These two online PI algorithms work in a model-free manner and do not require prior knowledge of the system matrices over the whole learning period including the control policy initialization and can lead to the desired control policy with satisfactory system performance. We demonstrate the effectiveness of the proposed learning algorithms with an example of a power system Note to Practitioners-Solving the H-infinity regulation problem for linear continuous-time systems can be achieved by finding the Nash equilibrium of the two-player zero-sum game. However, it is a challenge for control practitioners to design the H-infinity regulation controller with completely unknown dynamics due to the fact that it is nontrivial to obtain precise prior knowledge of models/dynamics for many engineering systems. The current methods usually utilized system data to solve the Nash equilibrium solution by offline or online iterative computation, but most of them still needed prior knowledge of the system dynamics for policy seeking such as stabilizing/admissible control or disturbance policies in the initialization. To address such a challenge, this paper develops two homotopy-based online PI algorithms that solve the H-infinity regulation problem in a fully model-free manner. It is shown that the developed algorithms can find the Nash equilibrium solution by online measuring the system data, and overcomes the difficulty of finding an initial stabilizing control policy with unknown system dynamics. The validity of the algorithms is illustrated through a simulation study.
更多
查看译文
关键词
H(infinity )control,reinforcement learning,policy iteration,homotopy-based strategy,two-player zero-sum game
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要