Stable Control Policy and Transferable Reward Function via Inverse Reinforcement Learning.

ICCAI(2023)

引用 0|浏览14
暂无评分
摘要
Inverse reinforcement learning (IRL) can solve the problem of complex reward function shaping by learning from expert data. However, it is challenging to train when the expert data is insufficient, and its stability is difficult to guarantee. Moreover, the reward function of mainstream IRL can only adapt to subtle environmental changes. It cannot be directly transferred to a similar task scenario, so the generalization ability still needs to be improved. To address these issues, we propose an IRL algorithm to obtain a stable control policy and transferable reward function (ST-IRL). Firstly, by introducing the Wasserstein metric and adversarial training, we solve the problem that IRL is challenging to train in a new environment with little expert data. Secondly, we add state marginal matching (SMM), hyperparameter comparison and optimizer evaluation to address the model's generalisability problem. As a result, the control policy obtained by ST-IRL achieves outstanding control results in all four Mujoco benchmarks. Furthermore, in both the custom Ant and PointMaze environments, the reward function obtained by our algorithm exhibits promising transferability.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要