ViSaRL: Visual Reinforcement Learning Guided by Human Saliency
CoRR(2024)
摘要
Training robots to perform complex control tasks from high-dimensional pixel
input using reinforcement learning (RL) is sample-inefficient, because image
observations are comprised primarily of task-irrelevant information. By
contrast, humans are able to visually attend to task-relevant objects and
areas. Based on this insight, we introduce Visual Saliency-Guided Reinforcement
Learning (ViSaRL). Using ViSaRL to learn visual representations significantly
improves the success rate, sample efficiency, and generalization of an RL agent
on diverse tasks including DeepMind Control benchmark, robot manipulation in
simulation and on a real robot. We present approaches for incorporating
saliency into both CNN and Transformer-based encoders. We show that visual
representations learned using ViSaRL are robust to various sources of visual
perturbations including perceptual noise and scene variations. ViSaRL nearly
doubles success rate on the real-robot tasks compared to the baseline which
does not use saliency.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要