Divide and Conquer: Provably Unveiling the Pareto Front with Multi-Objective Reinforcement Learning
CoRR(2024)
摘要
A significant challenge in multi-objective reinforcement learning is
obtaining a Pareto front of policies that attain optimal performance under
different preferences. We introduce Iterated Pareto Referent Optimisation
(IPRO), a principled algorithm that decomposes the task of finding the Pareto
front into a sequence of single-objective problems for which various solution
methods exist. This enables us to establish convergence guarantees while
providing an upper bound on the distance to undiscovered Pareto optimal
solutions at each step. Empirical evaluations demonstrate that IPRO matches or
outperforms methods that require additional domain knowledge. By leveraging
problem-specific single-objective solvers, our approach also holds promise for
applications beyond multi-objective reinforcement learning, such as in
pathfinding and optimisation.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要