Programmatic Imitation Learning from Unlabeled and Noisy Demonstrations
arXiv (Cornell University)(2023)
摘要
Imitation Learning (IL) is a promising paradigm for teaching robots to
perform novel tasks using demonstrations. Most existing approaches for IL
utilize neural networks (NN), however, these methods suffer from several
well-known limitations: they 1) require large amounts of training data, 2) are
hard to interpret, and 3) are hard to repair and adapt. There is an emerging
interest in programmatic imitation learning (PIL), which offers significant
promise in addressing the above limitations. In PIL, the learned policy is
represented in a programming language, making it amenable to interpretation and
repair. However, state-of-the-art PIL algorithms assume access to action labels
and struggle to learn from noisy real-world demonstrations. In this paper, we
propose PLUNDER, a novel PIL algorithm that integrates a probabilistic program
synthesizer in an iterative Expectation-Maximization (EM) framework to address
these shortcomings. Unlike existing PIL approaches, PLUNDER synthesizes
probabilistic programmatic policies that are particularly well-suited for
modeling the uncertainties inherent in real-world demonstrations. Our approach
leverages an EM loop to simultaneously infer the missing action labels and the
most likely probabilistic policy. We benchmark PLUNDER against several
established IL techniques, and demonstrate its superiority across five
challenging imitation learning tasks under noise. PLUNDER policies achieve 95
accuracy in matching the given demonstrations, outperforming the next best
baseline by 19
complete the tasks 17
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要