Differentially Private Reinforcement Learning with Self-Play
arxiv(2024)
摘要
We study the problem of multi-agent reinforcement learning (multi-agent RL)
with differential privacy (DP) constraints. This is well-motivated by various
real-world applications involving sensitive data, where it is critical to
protect users' private information. We first extend the definitions of Joint DP
(JDP) and Local DP (LDP) to two-player zero-sum episodic Markov Games, where
both definitions ensure trajectory-wise privacy protection. Then we design a
provably efficient algorithm based on optimistic Nash value iteration and
privatization of Bernstein-type bonuses. The algorithm is able to satisfy JDP
and LDP requirements when instantiated with appropriate privacy mechanisms.
Furthermore, for both notions of DP, our regret bound generalizes the best
known result under the single-agent RL case, while our regret could also reduce
to the best known result for multi-agent RL without privacy constraints. To the
best of our knowledge, these are the first line of results towards
understanding trajectory-wise privacy protection in multi-agent RL.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要