Parallel Deep Reinforcement Learning Method for Gait Control of Biped Robot

IEEE Transactions on Circuits and Systems II: Express Briefs(2022)

引用 7|浏览20
暂无评分
摘要
In this brief, a parallel Deep Deterministic Policy Gradient (DDPG) algorithm is presented for biped robot gait control. Biped robot gait control is a high-dimensional continuous problem. It is challenging to obtain a fast and stable gait. Traditional methods cannot fully utilize autonomous exploration capability of a biped robot. A multiple Actor-Critic (AC) network is established to expand the scope of exploration and improve training efficiency. For optimizing experience replay mechanism, an experience filtering unit is introduced, and a cosine similarity method is used to classify experience. Then, a Markov Decision Process (MDP) model based on knowledge and experience is designed to solve the problem of sparse rewards. Finally, experimental results show that the parallel DDPG algorithm can make the biped robot walk more quickly and stably, and the speed reaches 0.62 m/s.
更多
查看译文
关键词
Parallel deep deterministic policy gradient,biped robot,gait control,experience replay mechanism,knowledge and experience
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要