The optimal probability of the risk for finite horizon partially observable Markov decision processes

Xian Wen,Haifeng Huo, Jinhua Cui

AIMS MATHEMATICS(2023)

引用 0|浏览1
暂无评分
摘要
This paper investigates the optimality of the risk probability for finite horizon partially observable discrete-time Markov decision processes (POMDPs). The probability of the risk is optimized based on the criterion of total rewards not exceeding the preset goal value, which is different from the optimal problem of expected rewards. Based on the Bayes operator and the filter equations, the optimization problem of risk probability can be equivalently reformulated as filtered Markov decision processes. As an advantage of developing the value iteration technique, the optimality equation satisfied by the value function is established and the existence of the risk probability optimal policy is proven. Finally, an example is given to illustrate the effectiveness of using the value iteration algorithm to compute the value function and optimal policy.
更多
查看译文
关键词
partially observable Markov decision processes,risk probability criterion,Bayes operator,the optimal policy
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要