Pruning Dominated Policies in Multiobjective Pareto Q-Learning.
ADVANCES IN ARTIFICIAL INTELLIGENCE, CAEPIA 2018(2018)
摘要
The solution for a Multi-Objetive Reinforcement Learning problem is a set of Pareto optimal policies. MPQ-learning is a recent algorithm that approximates the whole set of all Pareto-optimal deterministic policies by directly generalizing Q-learning to the multiobjective setting. In this paper we present a modification of MPQ-learning that avoids useless cyclical policies and thus improves the number of training steps required for convergence.
更多查看译文
关键词
Pareto-optimal Policies, Training Step, Nonstationary Policies, Reward Vector, Multiobjective Reinforcement Learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要