W-IRL: Inverse Reinforcement Learning via Wasserstein Metric

2023 3rd International Conference on Computer, Control and Robotics (ICCCR)(2023)

引用 1|浏览8
In the control applications based on reinforcement learning, profound professional knowledge and engineering experience are required to set the reward function manually. Therefore, to reduce the design difficulty of a reward function and popularize reinforcement learning in the control domain, it is necessary to introduce inverse reinforcement learning (IRL), which uses an expert data-driven method to construct a reward function instead of a manually designed reward. However, since the control field often faces complex task scenarios involving high-dimensional continuous space, insufficient expert data may lead to problems when using the IRL algorithm, such as difficult training, slow convergence, and poor control effect. In order to solve these problems, this paper proposes an inverse reinforcement learning algorithm based on the Wasserstein metric (W-IRL). Initially, aiming at the problem of hard training and slow convergence, this paper uses the Wasserstein metric as a loss function to alleviate the gradient disappearance and explosion caused by high-dimensional spatial data training. Additionally, to improve the IRL control effect, this paper adopts an adversarial approach by using the reward function as the discriminator and the control strategy as the generator, and adopts the idea of state marginal matching (SMM) in the training process to improve the stability of the model. The experiments are conducted in four Mujoco simulation environments with high-dimensional space and a small amount of expert data. The results prove that the method proposed by this article outperforms mainstream IRL in control performance, robustness, and training stability.
inverse reinforcement learning,wasserstein metric,reward function,adversarial training
AI 理解论文
Chat Paper