Finite Time Analysis of Constrained Actor Critic and Constrained Natural Actor Critic Algorithms

CoRR（2023）

引用 0|浏览0

暂无评分

摘要

Actor Critic methods have found immense applications on a wide range of Reinforcement Learning tasks especially when the state-action space is large. In this paper, we consider actor critic and natural actor critic algorithms with function approximation for constrained Markov decision processes (C-MDP) involving inequality constraints and carry out a non-asymptotic analysis for both of these algorithms in a non-i.i.d (Markovian) setting. We consider the long-run average cost criterion where both the objective and the constraint functions are suitable policy-dependent long-run averages of certain prescribed cost functions. We handle the inequality constraints using the Lagrange multiplier method. We prove that these algorithms are guaranteed to find a first-order stationary point (i.e., ‖∇ L(θ,γ)‖_2^2 ≤ϵ) of the performance (Lagrange) function L(θ,γ), with a sample complexity of Õ(ϵ^-2.5) in the case of both Constrained Actor Critic (C-AC) and Constrained Natural Actor Critic (C-NAC) algorithms.We also show the results of experiments on three different Safety-Gym environments.

查看译文

关键词

constrained actor critic,finite time analysis,constrained natural

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要