Generalized Inverse Reinforcement Learning

semanticscholar(2017)

引用 0|浏览3
暂无评分
摘要
Inverse Reinforcement Learning (IRL) is used to teach behaviors to agents, by having them learn a reward function from example trajectories. The underlying assumption is usually that these trajectories represent optimal behavior. However, it is not always possible for a user to provide examples of optimal trajectories. This problem has been tackled previously by labeling trajectories with a score that indicates bad, as well as good, behaviors. In this work, we formalize the IRL problem in a generalized framework that allows for learning from good and bad demonstrations, and everything in between. In our framework, users can score entire trajectories (as usual) as well as particular state-action pairs. This additional control allows the agent to learn preferred behaviors from a relatively small number of trajectories. We expect this framework to be especially useful in robotics domains, where the user can collect fewer trajectories at the cost of labeling bad state-action pairs, which might be easier than maneuvering a robot to collect additional (entire) trajectories.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要