Jointly Learning "What" and "How" from Instructions and Goal-States.

ICLR(2018)

引用 4|浏览260
暂无评分
摘要
Training agents to follow instructions requires some way of rewarding them for behavior which accomplishes the intent of the instruction. For non-trivial instructions, which may be either underspecified or contain some ambiguity, it can be difficult or impossible to specify a reward function or obtain relatable expert trajectories for the agent to imitate. For these scenarios, we introduce a method which requires only pairs of instructions and examples of goal states, from which we can jointly learn a model of the instruction-conditional reward and a policy which executes instructions. Our experiments in a gridworld compare the effectiveness of our method with that of RL in a control setting with available ground-truth reward. We furthermore evaluate the generalization of our approach to unseen instructions, and to scenarios where environment dynamics change outside of training, requiring fine-tuning of the policy “in the wild”.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要