Optimal Smoothed Variable Sample-size Accelerated Proximal Methods for Structured Nonsmooth Stochastic Convex Programs

arXiv: Optimization and Control(2018)

引用 24|浏览37
暂无评分
摘要
We consider a class of structured nonsmooth stochastic convex programs. Traditional stochastic approximation schemes in nonsmooth regimes are hindered by a convergence rate of $mathcal{O}(1/sqrt{k})$ compared with a linear and sublinear (specifically $mathcal{O}(1/k^2)$) in deterministic strongly convex and convex regimes, respectively. One avenue for addressing the gaps in the rates is through the use of an increasing batch-size of gradients at each step, as adopted in the seminal paper by Ghadimi and Lan where the optimal rate of $mathcal{O}(1/k^2)$ and the optimal oracle complexity of $mathcal{O}(1/epsilon^2)$ was established in the smooth convex regime. Inspired by the work by Ghadimi and Lan and by extending our prior works, we make several contributions in the present work. (I) Strongly convex $f$. Here, we develop a variable sample-size accelerated proximal method (VS-APM) where the number of proximal evaluations to obtain an $epsilon$ solution is shown to be $mathcal{O}(sqrt{kappa} log(1/epsilon))$ while the oracle complexity is $mathcal{O}(sqrt{kappa}/epsilon)$, both of which are optimal and $kappa$ denotes the condition number; (II) Convex and nonsmooth $f$. In this setting, we develop an iterative smoothing extension of (VS-APM) (referred to as (sVS-APM) where the sample-average of gradients of a smoothed function is employed at every step. By suitably choosing the smoothing, steplength, and batch-size sequences, we prove that the expected sub-optimality diminishes to zero at the rate of $mathcal{O}(1/k)$ and admits the optimal oracle complexity of $mathcal{O}(1/epsilon^2)$. (III) Convex $f$. Finally, we show that (sVS-APM) and (VS-APM) produce sequences that converge almost surely to a solution of the original problem.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要