Aux-AIRL: End-to-End Self-Supervised Reward Learning for Extrapolating beyond Suboptimal Demonstrations

Y Cui, B Liu, A Saran,S Giguere,P Stone,S Niekum

user-618b9067e554220b8f259598（2021）

引用 1|浏览3

暂无评分

摘要

Real-world human demonstrations are often suboptimal. How to extrapolate beyond suboptimal demonstration is an important open research question. In this ongoing work, we analyze the success of a previous state-of-the-art self-supervised reward learning method that requires four sequential optimization steps, and propose a simple end-toend imitation learning method Aux-ARIL that extrapolates from suboptimal demonstrations without requiring multiple optimization steps.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要