QGFN: Controllable Greediness with Action Values
CoRR(2024)
摘要
Generative Flow Networks (GFlowNets; GFNs) are a family of
reward/energy-based generative methods for combinatorial objects, capable of
generating diverse and high-utility samples. However, biasing GFNs towards
producing high-utility samples is non-trivial. In this work, we leverage
connections between GFNs and reinforcement learning (RL) and propose to combine
the GFN policy with an action-value estimate, Q, to create greedier sampling
policies which can be controlled by a mixing parameter. We show that several
variants of the proposed method, QGFN, are able to improve on the number of
high-reward samples generated in a variety of tasks without sacrificing
diversity.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要