Finite Time Analysis of Constrained Actor Critic and Constrained Natural Actor Critic Algorithms
CoRR(2023)
摘要
Actor Critic methods have found immense applications on a wide range of
Reinforcement Learning tasks especially when the state-action space is large.
In this paper, we consider actor critic and natural actor critic algorithms
with function approximation for constrained Markov decision processes (C-MDP)
involving inequality constraints and carry out a non-asymptotic analysis for
both of these algorithms in a non-i.i.d (Markovian) setting. We consider the
long-run average cost criterion where both the objective and the constraint
functions are suitable policy-dependent long-run averages of certain prescribed
cost functions. We handle the inequality constraints using the Lagrange
multiplier method. We prove that these algorithms are guaranteed to find a
first-order stationary point (i.e., ‖∇ L(θ,γ)‖_2^2
≤ϵ) of the performance (Lagrange) function L(θ,γ), with
a sample complexity of Õ(ϵ^-2.5) in the case of
both Constrained Actor Critic (C-AC) and Constrained Natural Actor Critic
(C-NAC) algorithms.We also show the results of experiments on three different
Safety-Gym environments.
更多查看译文
关键词
constrained actor critic,finite time analysis,constrained natural
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要