Making the Last Iterate of SGD Information Theoretically Optimal

COLT, pp. 1752-1755, 2019.

Cited by: 18|Bibtex|Views29|Links
EI

Abstract:

Stochastic gradient descent (SGD) is one of the most widely used algorithms for large scale optimization problems. While classical theoretical analysis of SGD for convex problems studies (suffix) emph{averages} of iterates and obtains information theoretically optimal bounds on suboptimality, the emph{last point} of SGD is, by far, the mo...More

Code:

Data:

Your rating :
0

 

Tags
Comments