Trading Utility and Uncertainty: Applying the Value of Information to Resolve the Exploration–Exploitation Dilemma in Reinforcement Learning

Handbook of Reinforcement Learning and ControlStudies in Systems, Decision and Control(2021)

引用 0|浏览0
暂无评分
摘要
A fundamental problem in reinforcement learning is the exploration–exploitation dilemma: a search problem that entails sufficiently investigating the possible action choices and exploiting those that work well for certain contexts. Few exploration mechanisms, however, provide expected performance guarantees for a given search amount. Here, we show that this dilemma can be addressed and the expected agent performance quantified by optimizing Stratonovich’s value of information. The value of information is an information-theoretic criterion that specifies the greatest increase in rewards, from the worst case, subject to a certain uncertainty amount. In the context of reinforcement learning, uncertainty is quantified by a constrained mutual dependence between random variables. When the mutual dependence between the random variables go to zero, agents tend to exploit its acquired knowledge about the environment; little to no improvements in policy performance are obtained in this case. As the mutual dependence increases, a great amount of exploration is permitted and the policy can converge to the global-best action-selection strategy. Optimizing the value of information yields action-selection update strategies that, in the limit, is theoretically guaranteed to uncover the optimal policy for a given mutual dependence amount. We show that, in a finite number of episodes, the value of information yields policies that outperform conventional exploration mechanisms for both single-state and multi-state, multi-action environment abstractions based on Markov decision processes.
更多
查看译文
关键词
exploration–exploitation dilemma,reinforcement,uncertainty,trading
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要