An Empirical Relative Value Learning Algorithm For Non-Parametric Mdps With Continuous State Space

2019 18TH EUROPEAN CONTROL CONFERENCE (ECC)(2019)

引用 5|浏览31
暂无评分
摘要
We propose an empirical relative value learning (ERVL) algorithm for non-parametric MDPs with continuous state space and finite actions and average reward criterion. The ERVL algorithm relies on function approximation via nearest neighbors, and minibatch samples for value function update. It is universal (will work for any MDP), computationally quite simple and yet provides arbitrarily good approximation with high probability in finite time. This is the first such algorithm for non-parametric (and continuous state space) MDPs with average reward criteria with these provable properties as far as we know. Numerical evaluation on a benchmark problem of optimal replacement suggests good performance.
更多
查看译文
关键词
empirical relative value learning algorithm,continuous state space,finite actions,average reward criterion,ERVL algorithm,function approximation,nonparametric MDP
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要