Overfitting and Optimization in Offline Policy Learning

David Brandfonbrener
David Brandfonbrener
William F. Whitney
William F. Whitney
Rajesh Ranganath
Rajesh Ranganath
Cited by: 0|Bibtex|Views8|Links

Abstract:

We consider the task of policy learning from an offline dataset generated by some behavior policy. We analyze the two most prominent families of algorithms for this task: policy optimization and Q-learning. We demonstrate that policy optimization suffers from two problems, overfitting and spurious minima, that do not appear in Q-learnin...More

Code:

Data:

Your rating :
0

 

Tags
Comments