Learning to Score Behaviors for Guided Policy Optimization

ICML, pp. 7445-7454, 2020.

被引用3|浏览74
EI
微博一下
We introduce a new approach for comparing reinforcement learning policies, using Wasserstein distances in a newly defined latent behavioral space

摘要

We introduce a new approach for comparing reinforcement learning policies, using Wasserstein distances (WDs) in a newly defined latent behavioral space. We show that by utilizing the dual formulation of the WD, we can learn score functions over policy behaviors that can in turn be used to lead policy optimization towards (or away from) (u...更多

代码

数据

0
下载 PDF 全文
引用
微博一下
您的评分 :
0

 

标签
评论