Reward Model Learning Vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences

ICML 2024（2024）

Cited 10|Views41

Key words

Monetary Policy Transmission

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined