Imitation Learning as f-Divergence Minimization
CoRR(2019)
摘要
We address the problem of imitation learning with multi-modal demonstrations.
Instead of attempting to learn all modes, we argue that in many tasks it is
sufficient to imitate any one of them. We show that the state-of-the-art
methods such as GAIL and behavior cloning, due to their choice of loss
function, often incorrectly interpolate between such modes. Our key insight is
to minimize the right divergence between the learner and the expert
state-action distributions, namely the reverse KL divergence or I-projection.
We propose a general imitation learning framework for estimating and minimizing
any f-Divergence. By plugging in different divergences, we are able to recover
existing algorithms such as Behavior Cloning (Kullback-Leibler), GAIL (Jensen
Shannon) and Dagger (Total Variation). Empirical results show that our
approximate I-projection technique is able to imitate multi-modal behaviors
more reliably than GAIL and behavior cloning.
更多查看译文
关键词
learning,f-divergence
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要