Solving Long-run Average Reward Robust MDPs via Stochastic Games
CoRR(2023)
摘要
Markov decision processes (MDPs) provide a standard framework for sequential
decision making under uncertainty. However, transition probabilities in MDPs
are often estimated from data and MDPs do not take data uncertainty into
account. Robust Markov decision processes (RMDPs) address this shortcoming of
MDPs by assigning to each transition an uncertainty set rather than a single
probability value. The goal of solving RMDPs is then to find a policy which
maximizes the worst-case performance over the uncertainty sets. In this work,
we consider polytopic RMDPs in which all uncertainty sets are polytopes and
study the problem of solving long-run average reward polytopic RMDPs. Our focus
is on computational complexity aspects and efficient algorithms. We present a
novel perspective on this problem and show that it can be reduced to solving
long-run average reward turn-based stochastic games with finite state and
action spaces. This reduction allows us to derive several important
consequences that were hitherto not known to hold for polytopic RMDPs. First,
we derive new computational complexity bounds for solving long-run average
reward polytopic RMDPs, showing for the first time that the threshold decision
problem for them is in NP coNP and that they admit a randomized algorithm with
sub-exponential expected runtime. Second, we present Robust Polytopic Policy
Iteration (RPPI), a novel policy iteration algorithm for solving long-run
average reward polytopic RMDPs. Our experimental evaluation shows that RPPI is
much more efficient in solving long-run average reward polytopic RMDPs compared
to state-of-the-art methods based on value iteration.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要