Worst-case analysis for a leader-follower partially observable stochastic game

arXiv: Optimization and Control(2022)

引用 23|浏览5
暂无评分
摘要
Although Partially Observable Stochastic Games (POSGs) provide a powerful mathematical paradigm for modeling multi-agent dynamic decision making under uncertainty and partial information, they are notoriously hard to solve (e.g., the common-payoff POSGs are NEXP-complete) and have an extensive data requirement on each agent. The latter may represent a serious challenge to a defending agent if he/she has limited knowledge of its adversary. A worst-case analysis can significantly reduce both model computational complexity and data requirements regarding the adversary; further, a (near) optimal worst-case policy may represent a useful guide for action selection for risk-averse defenders (e.g., benchmarks). This article introduces a worst-case analysis to a leader-follower POSG where: (i) the defending leader has little knowledge of the adversarial follower's reward structure, level of rationality, and process for gathering and transmitting data relevant for decision making; (ii) the objective is to determine a best worst-case value function and a control strategy for the leader. We show that the worst-case assumption transforms this POSG into a more computationally tractable single-agent problem with a simple sufficient statistic. However, the value function can be non-convex, in contrast with the value function of a partially observable Markov decision process. We design an iterative solution procedure for computing a lower bound of the leader's value function and its control policy for the finite horizon case. This approach was numerically illustrated to support decision making in a security example.
更多
查看译文
关键词
Worst-case analysis, non-convex value function, partially observable Markov decision processes, partially observable stochastic game
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要