Adapting Reinforcement Learning Algorithms to Episodic Learning of POMDPs MS & E 338 Final Report Winter 2014-15

semanticscholar(2015)

引用 0|浏览0
暂无评分
摘要
The problem considered in the paper is the joint learning and planning or Reinforcement Learning (RL) problem for Partially Observable Markov Decision Processes (POMDP) with unknown rewards and dynamics. We formulate an episodic learning problem with partial state knowledge. We then adapt UCRL and PSRL into heuristics for this episodic POMDP-RL problem in several ways and apply these heuristics to a adaptations of difficult RL problems into POMDPs. We then present simulation results and analysis. Formulation Let (S,A, P,R,Ω, ρ,H, L) be a POMDP where S is the state space, A the action set, P(·|s, a) set of conditional probabilities between states, R the reward function, Ω the observation set, H the episode length, and E the number of episodes. For episode ` ∈ {1, . . . , L}, the initial state is drawn from some s0 ∼ ρ and we observe o0. For t ∈ {1 . . . H} we let st, at, r t and ot respectively denote the state, action, reward and observation at time t. Each at will be computed using a policy μ ` t : Ω → A that we determine at the start of the episode according to our algorithm and using the data from the previous `− 1 episodes. We assume that the underlying MDP is time-homogeneous. We will consider problems with a finite E, with the goal
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要