Learning robust policies when losing control

Adaptive and Learning Agents workshop at AAMAS(2018)

引用 2|浏览4
暂无评分
摘要
Many real-world applications require control strategies that provide robustness against a model of potential temporary external control, such as failures of the designed controller or malicious attacks. In this article we assume a Markovian control model as a stepping stone towards extending Q-learning akin to the options framework, but addressing the risk of losing control involuntarily to possibly malicious ‘options’. The resulting reinforcement learning algorithm maximises expected return, and is model-free with respect to domain dynamics, but model-based with respect to control transitions. Our model allows to exploit parallel off-policy updates to efficiently learn from experience. Results demonstrate that effective safe strategies can be learned from mistakes, possibly even before attacks occur. Our algorithm compares favourably to on-policies SARSA and Expected SARSA and off-policy Q-learning in a multi-agent benchmark, can be trained using forward domain models, and is compatible with many state of the art extensions, such as deep learning, Retrace (λ), or Q (σ). We thus pave the way to learn robust strategies in critical multi-agent domains, such as smart grids, where graceful degradation is a prerequisite.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要