Proximal Policy Gradient Arborescence for Quality Diversity Reinforcement Learning
ICLR 2024(2023)
摘要
Training generally capable agents that thoroughly explore their environment
and learn new and diverse skills is a long-term goal of robot learning. Quality
Diversity Reinforcement Learning (QD-RL) is an emerging research area that
blends the best aspects of both fields – Quality Diversity (QD) provides a
principled form of exploration and produces collections of behaviorally diverse
agents, while Reinforcement Learning (RL) provides a powerful performance
improvement operator enabling generalization across tasks and dynamic
environments. Existing QD-RL approaches have been constrained to sample
efficient, deterministic off-policy RL algorithms and/or evolution strategies,
and struggle with highly stochastic environments. In this work, we, for the
first time, adapt on-policy RL, specifically Proximal Policy Optimization
(PPO), to the Differentiable Quality Diversity (DQD) framework and propose
additional improvements over prior work that enable efficient optimization and
discovery of novel skills on challenging locomotion tasks. Our new algorithm,
Proximal Policy Gradient Arborescence (PPGA), achieves state-of-the-art
results, including a 4x improvement in best reward over baselines on the
challenging humanoid domain.
更多查看译文
关键词
Reinforcement Learning,Quality Diversity,Robotics,Machine Learning,Evolution Strategies
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要