Adaptive Discounting of Training Time Attacks
CoRR(2024)
摘要
Among the most insidious attacks on Reinforcement Learning (RL) solutions are
training-time attacks (TTAs) that create loopholes and backdoors in the learned
behaviour. Not limited to a simple disruption, constructive TTAs (C-TTAs) are
now available, where the attacker forces a specific, target behaviour upon a
training RL agent (victim). However, even state-of-the-art C-TTAs focus on
target behaviours that could be naturally adopted by the victim if not for a
particular feature of the environment dynamics, which C-TTAs exploit. In this
work, we show that a C-TTA is possible even when the target behaviour is
un-adoptable due to both environment dynamics as well as non-optimality with
respect to the victim objective(s). To find efficient attacks in this context,
we develop a specialised flavour of the DDPG algorithm, which we term
gammaDDPG, that learns this stronger version of C-TTA. gammaDDPG dynamically
alters the attack policy planning horizon based on the victim's current
behaviour. This improves effort distribution throughout the attack timeline and
reduces the effect of uncertainty the attacker has about the victim. To
demonstrate the features of our method and better relate the results to prior
research, we borrow a 3D grid domain from a state-of-the-art C-TTA for our
experiments. Code is available at "bit.ly/github-rb-gDDPG".
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要