Deep Bayesian Reinforcement Learning for Spacecraft Proximity Maneuvers and Docking
arxiv(2023)
摘要
In the pursuit of autonomous spacecraft proximity maneuvers and docking(PMD),
we introduce a novel Bayesian actor-critic reinforcement learning algorithm to
learn a control policy with the stability guarantee. The PMD task is formulated
as a Markov decision process that reflects the relative dynamic model, the
docking cone and the cost function. Drawing from the principles of Lyapunov
theory, we frame the temporal difference learning as a constrained Gaussian
process regression problem. This innovative approach allows the state-value
function to be expressed as a Lyapunov function, leveraging the Gaussian
process and deep kernel learning. We develop a novel Bayesian quadrature policy
optimization procedure to analytically compute the policy gradient while
integrating Lyapunov-based stability constraints. This integration is pivotal
in satisfying the rigorous safety demands of spaceflight missions. The proposed
algorithm has been experimentally evaluated on a spacecraft air-bearing testbed
and shows impressive and promising performance.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要