Inverse Reinforcement Learning by Estimating Expertise of Demonstrators
CoRR(2024)
摘要
In Imitation Learning (IL), utilizing suboptimal and heterogeneous
demonstrations presents a substantial challenge due to the varied nature of
real-world data. However, standard IL algorithms consider these datasets as
homogeneous, thereby inheriting the deficiencies of suboptimal demonstrators.
Previous approaches to this issue typically rely on impractical assumptions
like high-quality data subsets, confidence rankings, or explicit environmental
knowledge. This paper introduces IRLEED, Inverse Reinforcement Learning by
Estimating Expertise of Demonstrators, a novel framework that overcomes these
hurdles without prior knowledge of demonstrator expertise. IRLEED enhances
existing Inverse Reinforcement Learning (IRL) algorithms by combining a general
model for demonstrator suboptimality to address reward bias and action
variance, with a Maximum Entropy IRL framework to efficiently derive the
optimal policy from diverse, suboptimal demonstrations. Experiments in both
online and offline IL settings, with simulated and human-generated data,
demonstrate IRLEED's adaptability and effectiveness, making it a versatile
solution for learning from suboptimal demonstrations.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要