Value Preserving State-Action Abstractions
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108(2020)
摘要
The Abstraction can improve the sample efficiency of reinforcement learning. However, the process of abstraction inherently discards information, potentially compromising an agents ability to represent high-value policies. To mitigate this, we here introduce combinations of state abstractions and options that are guaranteed to preserve the representation of near-optimal policies. We first define phi-relative options, a general formalism for analyzing the value loss of options paired with a state abstraction, and present necessary and sufficient conditions for phi-relative options to preserve near-optimal behavior in any finite Markov Decision Process. We further show that, under appropriate assumptions, phi-relative options can be composed to induce hierarchical abstractions that are also guaranteed to represent high-value policies.ion can improve the sample efficiency of reinforcement learning. However, the process of abstraction inherently discards information, potentially compromising an agents ability to represent high-value policies. To mitigate this, we here introduce combinations of state abstractions and options that are guaranteed to preserve the representation of near-optimal policies. We first define phi-relative options, a general formalism for analyzing the value loss of options paired with a state abstraction, and present necessary and sufficient conditions for phi-relative options to preserve near-optimal behavior in any finite Markov Decision Process. We further show that, under appropriate assumptions, phi-relative options can be composed to induce hierarchical abstractions that are also guaranteed to represent high-value policies.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络