Parallel and Flexible Dynamic Programming via the Mini-Batch Bellman Operator

IEEE TRANSACTIONS ON AUTOMATIC CONTROL(2024)

引用 0|浏览3
暂无评分
摘要
The Bellman operator constitutes the foundation of dynamic programming (DP). An alternative is presented by the Gauss-Seidel operator, whose evaluation, differently from that of the Bellman operator where the states are all processed at once, updates one state at a time while incorporating into the computation the interim results. The provably better convergence rate of DP methods based on the Gauss-Seidel operator comes at the price of an inherent sequentiality, which prevents the exploitation of modern multicore systems. In this work, we propose a new operator for DP, namely, the mini-batch Bellman operator, which aims at realizing the tradeoff between the better convergence rate of the methods based on the Gauss-Seidel operator and the parallelization capability offered by the Bellman operator. After the introduction of the new operator, a theoretical analysis for validating its fundamental properties is conducted. Such properties allow one to successfully deploy the new operator in the main DP schemes, such as value iteration and modified policy iteration. We compare the convergence of the DP algorithm based on the new operator with its earlier counterparts, shedding light on the algorithmic advantages of the new formulation and the impact of the batch-size parameter on the convergence. Finally, an extensive numerical evaluation of the newly introduced operator is conducted. In accordance with the theoretical derivations, the numerical results show the competitive performance of the proposed operator and its superior flexibility, which allows one to adapt the efficiency of its iterations to different structures of MDPs and hardware setups.
更多
查看译文
关键词
Convergence,Costs,Dynamic programming,Optimal control,Cost function,Standards,Process control,Algorithms,dynamic programming (DP),parallel programming
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要