Feature Extraction for Effective and Efficient Deep Reinforcement Learning on Real Robotic Platforms.

ICRA(2023)

引用 0|浏览3
暂无评分
摘要
Deep reinforcement learning (DRL) methods can solve complex continuous control tasks in simulated environments by taking actions based solely on state observations at each decision point. Because of the dynamics involved, individual snapshots of real-world sensor measurements afford only partial state observability, so it is typical to use a history of observations to improve training and policy performance. Such intertemporal information can be further exploited using a recurrent neural network (RNN) to reduce the dimensionality of the dynamic state representation. However, using RNNs as an internal part of a DRL network presents challenges of its own; and even then, the improvements in resulting policies are usually limited. To address these shortcomings, we propose using gated feature extraction to improve DRL training of real-world robots. Specifically, we use an untrained gated recurrent unit (GRU) to encode a low-dimension representation of the state observation sequence before passing it to the DRL training procedure. In addition to dimensionality reduction, this allows us to unroll the RNN by encoding the observations cumulatively as they are collected, thereby avoiding same-length input requirements, and train the RL network on the raw observations at the current step combined with the GRU-encoding of the preceding steps. Our simulation experiments employ gated feature extraction with the TD3 algorithm. Our results show that the GRU-encoded state observations improve the training speed and execution performance of the TD3 algorithm, improving the learned policies in all 19 test cases, exceeding the maximum achieved reward by over 38% in 8 and doubling the maximum achieved reward in three, while also outperforming a baseline implementation of SAC in 17 out of 19 environments. Moreover, the greatest improvement is seen in real-world experiments, where our approach successfully learns to balance a pendulum as well as a complex quadrupedal locomotion task. In contrast, the standard TD3 algorithm not only does not show any learning progress at all, but also repeatedly damages the hardware.
更多
查看译文
关键词
complex continuous control tasks,complex quadrupedal locomotion task,decision point,deep reinforcement learning methods,dimensionality reduction,DRL network,DRL training procedure,dynamic state representation,efficient deep reinforcement learning,execution performance,gated feature extraction,GRU-encoded state observations,GRU-encoding,individual snapshots,internal part,intertemporal information,learned policies,learning progress,low-dimension representation,partial state observability,policy performance,preceding steps,raw observations,real robotic platforms,real-world experiments,real-world robots,real-world sensor measurements,recurrent neural network,resulting policies,RL network,RNN,same-length input requirements,simulated environments,simulation experiments,standard TD3 algorithm,state observation sequence,training speed,untrained gated recurrent unit
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要