Trajectory VAE for multi-modal imitation

user-5e8423bd4c775ee160ac3e1a(2018)

引用 0|浏览5
暂无评分
摘要
We address the problem of imitating multi-modal expert demonstrations in sequential decision making problems. In many practical applications, for example video games, behavioural demonstrations are readily available that contain multi-modal structure not captured by typical existing imitation learning approaches. For example, differences in the observed players' behaviours may be representative of different underlying playstyles. In this paper, we use a generative model to capture different emergent playstyles in an unsupervised manner, enabling the imitation of a diverse range of distinct behaviours. We utilise a variational autoencoder to learn an embedding of the different types of expert demonstrations on the trajectory level, and jointly learn a latent representation with a policy. In experiments on a range of 2D continuous control problems representative of Minecraft environments, we empirically demonstrate that our model can capture a multi-modal structured latent space from the demonstrated behavioural trajectories.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要