Expectation maximisation methods for solving (PO)MDPs and optimal control problems

Bayesian Time Series Models(2011)

引用 4|浏览26
暂无评分
摘要
As this book demonstrates, the development of efficient probabilistic inference techniques has made considerable progress in recent years, in particular with respect to exploiting the structure (e.g., factored, hierarchical or relational) of discrete and continuous problem domains. In this chapter we show that these techniques can be used also for solving Markov decision processes (MDPs) or partially observable MDPs (POMDPs) when formulated in terms of a structured dynamic Bayesian network (DBN).
更多
查看译文
关键词
q function,gradient ascent,bellman equation,influence diagrams,reinforcement learning,maximum likelihood,unscented transform,computational statistics,markov decision process,free energy,independent component analysis,maximum a posteriori,planning,loopy belief propagation,value iteration,baum welch algorithm,hidden markov model,kullback leibler divergence
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要