An Empirical Algorithm For Relative Value Iteration For Average-Cost Mdps

2015 54TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC)(2015)

引用 9|浏览7
暂无评分
摘要
Infinite-horizon average-cost Markov decision processes are of interest in many scenarios. A dynamic programming algorithm, called the relative value iteration, is used to compute the optimal value function. For large state spaces, this often runs into difficulties due to its computational burden. We propose a simulation-based dynamic program called empirical relative value iteration (ERVI). The idea is very simple: replace the expectation in the Bellman operator with a sample average estimate, and then use projection to ensure boundedness of the iterates. We establish that the ERVI iterates converge to the optimal value function in the span-seminorm in probability as the number of samples taken goes to infinity. Simulation results show remarkably good performance even with a small number of samples.
更多
查看译文
关键词
Heuristic algorithms,Convergence,Markov processes,Approximation algorithms,Dynamic programming,Decision making,Computational complexity
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要