Dual ascent and primal-dual algorithms for infinite-horizon nonstationary markov decision processes

SIAM J. Optim.(2023)

引用 0|浏览1
暂无评分
摘要
Infinite-horizon nonstationary Markov decision processes (MDPs) extend their stationary counterparts by allowing temporal variations in immediate costs and transition probabilities. Bellman's characterization of optimality and equivalent primal-dual linear programming formulations for these MDPs include a countably infinite number of variables and equations. Simple policy iteration, also viewed as a primal simplex algorithm, is the state of the art in solving these MDPs. It produces a sequence of policies whose costs-to-go converge monotonically from above to optimal. This suffers from two limitations. A cost-improving policy update is computationally expensive and an optimality gap is missing. We propose two dual-based approaches to address these concerns. The first, called dual ascent, maintains approximate costs-to-go (dual variables) and corresponding non-negative errors in Bellman's equations. The dual variables are iteratively increased such that errors vanish asymptotically. This guarantees that dual variables converge monotonically from below to optimal. This has two limitations. It does not maintain a sequence of policies (primal variables). Hence, it does not provide a decision-making strategy at termination and does not offer an upper bound on the optimal costs-to-go. The second approach, termed the primal-dual method, addresses these limitations. It maintains a primal policy, dual approximations of its costs-to-go, the corresponding nonegative Bellman's errors, and inherits monotonic dual value convergence. The key is a so-called rebalancing step, which leads to a duality gap--based stopping criterion and also primal value convergence. Computational experiments demonstrate the benefits of primal-dual over dual ascent and that primal-dual is orders of magnitude faster than simple policy iteration.
更多
查看译文
关键词
dynamic programming,Bellman's equations,value convergence
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要