Offline Reinforcement Learning with Behavioral Supervisor Tuning
arxiv(2024)
摘要
Offline reinforcement learning (RL) algorithms are applied to learn
performant, well-generalizing policies when provided with a static dataset of
interactions. Many recent approaches to offline RL have seen substantial
success, but with one key caveat: they demand substantial per-dataset
hyperparameter tuning to achieve reported performance, which requires policy
rollouts in the environment to evaluate; this can rapidly become cumbersome.
Furthermore, substantial tuning requirements can hamper the adoption of these
algorithms in practical domains. In this paper, we present TD3 with Behavioral
Supervisor Tuning (TD3-BST), an algorithm that trains an uncertainty model and
uses it to guide the policy to select actions within the dataset support.
TD3-BST can learn more effective policies from offline datasets compared to
previous methods and achieves the best performance across challenging
benchmarks without requiring per-dataset tuning.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要