A Generalized Fundamental Matrix for Computing Fundamental Quantities of Markov Systems.

arXiv: Optimization and Control(2016)

引用 23|浏览5
暂无评分
摘要
As is well known, the fundamental matrix $(I - P + e pi)^{-1}$ plays an important role in the performance analysis of Markov systems, where $P$ is the transition probability matrix, $e$ is the column vector of ones, and $pi$ is the row vector of the steady state distribution. It is used to compute the performance potential (relative value function) of Markov decision processes under the average criterion, such as $g=(I - P + e pi)^{-1} f$ where $g$ is the column vector of performance potentials and $f$ is the column vector of reward functions. However, we need to pre-compute $pi$ before we can compute $(I - P + e pi)^{-1}$. In this paper, we derive a generalization version of the fundamental matrix as $(I - P + e r)^{-1}$, where $r$ can be any given row vector satisfying $r e neq 0$. With this generalized fundamental matrix, we can compute $g=(I - P + e r)^{-1} f$. The steady state distribution is computed as $pi = r(I - P + e r)^{-1}$. The Q-factors at every state-action pair can also be computed in a similar way. These formulas may give some insights on further understanding how to efficiently compute or estimate the values of $g$, $pi$, and Q-factors in Markov systems, which are fundamental quantities for the performance optimization of Markov systems.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要