B$^3$RTDP: A Belief Branch and Bound Real-Time Dynamic Programming Approach to Solving POMDPs

arxiv(2022)

引用 0|浏览0
暂无评分
摘要
Partially Observable Markov Decision Processes (POMDPs) offer a promising world representation for autonomous agents, as they can model both transitional and perceptual uncertainties. Calculating the optimal solution to POMDP problems can be computationally expensive as they require reasoning over the (possibly infinite) space of beliefs. Several approaches have been proposed to overcome this difficulty, such as discretizing the belief space, point-based belief sampling, and Monte Carlo tree search. The Real-Time Dynamic Programming approach of the RTDP-Bel algorithm approximates the value function by storing it in a hashtable with discretized belief keys. We propose an extension to the RTDP-Bel algorithm which we call Belief Branch and Bound RTDP (B$^3$RTDP). Our algorithm uses a bounded value function representation and takes advantage of this in two novel ways: a search-bounding technique based on action selection convergence probabilities, and a method for leveraging early action convergence called the \textit{Convergence Frontier}. Lastly, we empirically demonstrate that B$^3$RTDP can achieve greater returns in less time than the state-of-the-art SARSOP solver on known POMDP problems.
更多
查看译文
关键词
belief branch,real-time
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要