Approximability and efficient algorithms for constrained fixed-horizon POMDPs with durative actions

ARTIFICIAL INTELLIGENCE(2023)

引用 0|浏览8
暂无评分
摘要
Partially Observable Markov Decision Process (POMDP) is a fundamental model for probabilistic planning in stochastic domains. More recently, constrained POMDP and chance-constrained POMDP extend the model allowing constraints to be specified on some aspects of the policy in addition to the objective function. Despite their expressive power, these models assume all actions take a fixed duration, which poses a limitation in modeling real-world planning problems. In this work, we propose a unified model for durative POMDP and its constrained extensions. First, we convert these extensions into an Integer Linear Programming (ILP) formulation, which can be solved using existing solvers in the ILP literature. Second, a heuristic search approach is provided that can efficiently prune the search space, guided by solving successive partial ILP programs. Third, we give a theoretical analysis of the problem: unlike short-horizon POMDPs, with policies of a constant depth, which can be solved in polynomial time, the constrained extensions are NP-Hard even with a planning horizon of two and non-negative rewards. To alleviate that, we propose a Fully Polynomial Time Approximation Scheme (FPTAS) that computes (near) optimal deterministic policies in polynomial time. The FPTAS is among the best achievable in theory in terms of approximation ratio. Finally, evaluation results show that our approach is empirically superior to the state-of-the-art fixed-horizon chance-constrained POMDP solver. (c) 2023 Elsevier B.V. All rights reserved.
更多
查看译文
关键词
Constrained partially observable Markov decision process,Durative actions,Risk-bounded planning,Heuristic search
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要