Offline Imitation from Observation via Primal Wasserstein State Occupancy Matching.
CoRR(2023)
摘要
In real-world scenarios, arbitrary interactions with the environment can
often be costly, and actions of expert demonstrations are not always available.
To reduce the need for both, Offline Learning from Observations (LfO) is
extensively studied, where the agent learns to solve a task with only expert
states and \textit{task-agnostic} non-expert state-action pairs. The
state-of-the-art DIstribution Correction Estimation (DICE) methods minimize the
state occupancy divergence between the learner and expert policies. However,
they are limited to either $f$-divergences (KL and $\chi^2$) or Wasserstein
distance with Rubinstein duality, the latter of which constrains the underlying
distance metric crucial to the performance of Wasserstein-based solutions. To
address this problem, we propose Primal Wasserstein DICE (PW-DICE), which
minimizes the primal Wasserstein distance between the expert and learner state
occupancies with a pessimistic regularizer and leverages a contrastively
learned distance as the underlying metric for the Wasserstein distance.
Theoretically, we prove that our framework is a generalization of the
state-of-the-art, SMODICE, and unifies $f$-divergence and Wasserstein
minimization. Empirically, we find that PW-DICE improves upon several
state-of-the-art methods on multiple testbeds.
更多查看译文
关键词
imitation,primal wasserstein state
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要