Mirror Descent for Stochastic Control Problems with Measure-valued Controls
CoRR(2024)
摘要
This paper studies the convergence of the mirror descent algorithm for finite
horizon stochastic control problems with measure-valued control processes. The
control objective involves a convex regularisation function, denoted as h,
with regularisation strength determined by the weight τ≥ 0. The setting
covers regularised relaxed control problems. Under suitable conditions, we
establish the relative smoothness and convexity of the control objective with
respect to the Bregman divergence of h, and prove linear convergence of the
algorithm for τ=0 and exponential convergence for τ>0. The results
apply to common regularisers including relative entropy, χ^2-divergence,
and entropic Wasserstein costs. This validates recent reinforcement learning
heuristics that adding regularisation accelerates the convergence of gradient
methods. The proof exploits careful regularity estimates of backward stochastic
differential equations in the bounded mean oscillation norm.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要