C3PO: Learning to Achieve Arbitrary Goals via Massively Entropic Pretraining

Alexis Jacq,Manu Orsini,Gabriel Dulac-Arnold,Olivier Pietquin,Matthieu Geist,Olivier Bachem

arxiv（2022）

引用 0|浏览53

暂无评分

摘要

Given a particular embodiment, we propose a novel method (C3PO) that learns policies able to achieve any arbitrary position and pose. Such a policy would allow for easier control, and would be re-useable as a key building block for downstream tasks. The method is two-fold: First, we introduce a novel exploration algorithm that optimizes for uniform coverage, is able to discover a set of achievable states, and investigates its abilities in attaining both high coverage, and hard-to-discover states; Second, we leverage this set of achievable states as training data for a universal goal-achievement policy, a goal-based SAC variant. We demonstrate the trained policy's performance in achieving a large number of novel states. Finally, we showcase the influence of massive unsupervised training of a goal-achievement policy with state-of-the-art pose-based control of the Hopper, Walker, Halfcheetah, Humanoid and Ant embodiments.

查看译文

关键词

Reinforcement Learning,Exploration,Goal-conditioned Policy,Continuous Control

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要