Learning in Stackelberg Games with Non-myopic Agents

Economics and Computation(2022)

引用 1|浏览12
暂无评分
摘要
BSTRACTStackelberg games are a canonical model for strategic principal-agent interactions. Consider, for instance, a defense system that distributes its security resources across high-risk targets prior to attacks being executed; or a tax policymaker who sets rules on when audits are triggered prior to seeing filed tax reports; or a seller who chooses a price prior to knowing a customer's proclivity to buy. In each of these scenarios, a principal first selects an action x∈X and then an agent reacts with an action y∈Y, where X and Y are the principal's and agent's action spaces, respectively. In the examples above, agent actions correspond to which target to attack, how much tax to pay to evade an audit, and how much to purchase, respectively. Typically, the principal wants an x that maximizes their payoff when the agent plays a best response y = br(x); such a pair (x, y) is a Stackelberg equilibrium. By committing to a strategy, the principal can guarantee they achieve a higher payoff than in the fixed point equilibrium of the corresponding simultaneous-play game. However, finding such a strategy requires knowledge of the agent's payoff function. When faced with unknown agent payoffs, the principal can attempt to learn a best response via repeated interactions with the agent. If a (naïve) agent is unaware that such learning occurs and always plays a best response, the principal can use classical online learning approaches to optimize their own payoff in the stage game. Learning from myopic agents has been extensively studied in multiple Stackelberg games, including security games[2,6,7], demand learning[1,5], and strategic classification[3,4]. However, long-lived agents will generally not volunteer information that can be used against them in the future. This is especially the case in online environments where a learner seeks to exploit recently learned patterns of behavior as soon as possible, and the agent can see a tangible advantage for deviating from its instantaneous best response and leading the learner astray. This trade-off between the (statistical) efficiency of learning algorithms and the perverse incentives they may create over the long-term brings us to the main questions of this work: What are principled approaches to learning against non-myopic agents in general Stackelberg games? How can insights from learning against myopic agents be applied to learning in the non-myopic case?
更多
查看译文
关键词
stackelberg games,agents,learning,non-myopic
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要