Guiding Offline RL using a Safety Expert

PROCEEDINGS OF 7TH JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE AND MANAGEMENT OF DATA, CODS-COMAD 2024(2024)

引用 0|浏览0
暂无评分
摘要
Offline reinforcement learning is used to train policies in situations where it is expensive or infeasible to access the environment during training. An agent trained under such a scenario does not get corrective feedback once the learned policy starts diverging and may fall prey to the overestimation bias commonly seen in this setting. This increases the chances of the agent choosing potentially unsafe actions, especially in states with insufficient representation in the training dataset. In this paper, we explore the problem of acting safely in sparsely observed regions of the state space. We propose to leverage a safety expert to nudge an offline RL agent towards choosing safe actions in under-represented states in the dataset. The proposed framework in this paper transfers the safety expert's knowledge into an offline setting for states with high uncertainty to prevent catastrophic failures from occurring in safety-critical domains. We use a simple but effective approach to quantify the state uncertainty based on how frequently they appear in a training dataset. In states with high uncertainty, the offline RL agent mimics the safety expert, otherwise maximizing the long-term reward. Our approach has a plug-and-play nature, i.e., any existing value-based or actor-critic style offline RL algorithm can be guided by a safety expert. We finally show that such guided offline RL algorithms can outperform their state-of-the-art counterparts, reducing the chance of taking unsafe actions while simultaneously retaining competitive performance.
更多
查看译文
关键词
offline reinforcement learning,reinforcement learning,transfer learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要