Reinforcement learning in non-Markovian environments

Siddharth Chandak, Pratik Shah,Vivek S. Borkar, Parth Dodhia

SYSTEMS & CONTROL LETTERS(2024)

引用 0|浏览0
暂无评分
摘要
Motivated by the novel paradigm developed by Van Roy and coauthors for reinforcement learning in arbitrary non-Markovian environments, we propose a related formulation and explicitly pin down the error caused by non-Markovianity of observations when the Q -learning algorithm is applied to this formulation. Based on this observation, we propose that the criterion for agent design should be to seek good approximations for certain conditional laws. Inspired by classical stochastic control, we show that our problem reduces to that of recursive computation of approximate sufficient statistics. This leads to an autoencoder-based scheme for agent design which is then numerically tested on partially observed reinforcement learning environments.
更多
查看译文
关键词
Agent design,Curse of non-Markovianity,Recursively computed sufficient statistics,Q-learning,Partially observed MDP
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要