Understanding the Limits of Poisoning Attacks in Episodic Reinforcement Learning

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence(2022)

引用 13|浏览16
暂无评分
摘要
To understand the security threats to reinforcement learning (RL) algorithms, this paper studies poisoning attacks to manipulate any order-optimal learning algorithm towards a targeted policy in episodic RL and examines the potential damage of two natural types of poisoning attacks, i.e., the manipulation of reward or action. We discover that the effect of attacks crucially depends on whether the rewards are bounded or unbounded. In bounded reward settings, we show that only reward manipulation or only action manipulation cannot guarantee a successful attack. However, by combining reward and action manipulation, the adversary can manipulate any order-optimal learning algorithm to follow any targeted policy with \Theta(\sqrt{T}) total attack cost, which is order-optimal, without any knowledge of the underlying MDP. In contrast, in unbounded reward settings, we show that reward manipulation attacks are sufficient for an adversary to successfully manipulate any order-optimal learning algorithm to follow any targeted policy using \tilde{O}(\sqrt{T}) amount of contamination. Our results reveal useful insights about what can or cannot be achieved by poisoning attacks, and are set to spur more work on the design of robust RL algorithms.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要