Projection Efficient Subgradient Method and Optimal Nonsmooth Frank-Wolfe Method

NIPS 2020, 2020.

Cited by: 0|Bibtex|Views15|Links
Keywords:
lipschitz continuousprojected subgradient methodminimization oracle Efficient Subgradientfo callsupport vector machinesMore(24+)
Weibo:
We assume that the function is accessed with a first-order oracle and the set is accessed with either a projection oracle or a linear minimization oracle

Abstract:

We consider the classical setting of optimizing a nonsmooth Lipschitz continuous convex function over a convex constraint set, when having access to a (stochastic) first-order oracle (FO) for the function and a projection oracle (PO) for the constraint set. It is well known that to achieve $\epsilon$-suboptimality in high-dimensions, $\Th...More

Code:

Data:

0
Introduction
  • When queried at a point x, FO returns a subgradient of f at x and PO returns the projection of x onto X.
  • Finding an ε-suboptimal solution for this problem requires Ω(ε−2) FO calls in the worst case, when the dimension d is large [64].
  • This lower bound is tightly matched by the projected subgradient method (PGD).
  • PGD uses one PO call after every FO call, resulting in a PO calls
Highlights
  • In this paper, we consider the nonsmooth convex optimization (NSCO) problem with the First-order Oracle (FO) and the Projection Oracle (PO) defined as: NSCO : min f (x), s.t. x ∈ X, x

    first-order oracle (FO)(x) ∈ ∂f (x), and projection oracle (PO)(x) = PX (x) = argmin y∈X y−x (1)

    where f : Rd → R is a convex Lipschitz-continuous function, and X ⊆ Rd is a convex constraint
  • PO is often higher than the cost of an FO call. This begs the natural question, which surprisingly is largely unexplored in the general nonsmooth optimization setting: Can we design an algorithm whose PO calls complexity is significantly better than the optimal FO calls complexity O(ε−2)?
  • We introduce MOreau Projection Efficient Subgradient (MOPES) and show that it is guaranteed to find an ε-suboptimal solution for any constrained nonsmooth convex optimization problem using O(ε−1) PO calls and optimal O(ε−2) Stochastic First-order Oracle (SFO) calls
  • Our MOPES method guarantees significantly better PO-CC than projected subgradient method (PGD) that is still independent of dimension
  • We assume that the function is accessed with a first-order oracle (FO) and the set is accessed with either a projection oracle (PO) or a linear minimization oracle (LMO)
  • We introduce MOPES, and show that it finds an ε-suboptimal solution with O(ε−2) FO calls and O(ε−1) PO calls
Results
  • The authors present the main results . The authors first present the main ideas in Section 3.1 and the results for PO and LMO settings in Sections 3.2 and 3.3 respectively. 3.1 Main Ideas

    The authors are interested in the NSCO problem (1).
  • In Figure 2 the authors plot the mean sub-optimality gap: f − f∗, of the iterates against the number of LMO and FO calls, respectively, used to obtain that iterate.
  • In both these plots, while MOPES/MOLES and baselines have comparable FO-CC, MOPES/MOLES is significantly more efficient in the number of PO/LMO calls, matching the Theorems 1 and 2.
Conclusion
  • The authors study a canonical problem in optimization: minimizing a nonsmooth Lipschitz continuous convex function over a convex constraint set.
  • The authors assume that the function is accessed with a first-order oracle (FO) and the set is accessed with either a projection oracle (PO) or a linear minimization oracle (LMO).
  • The authors introduce MOLES, and show that it finds an ε-suboptimal solution with O(ε−2) FO and LMO calls
  • This is optimal in both the number of PO and the number of LMO calls.
  • This resolves a question left open since [84] on designing the optimal Frank-Wolfe type algorithm for nonsmooth functions
Summary
  • Introduction:

    When queried at a point x, FO returns a subgradient of f at x and PO returns the projection of x onto X.
  • Finding an ε-suboptimal solution for this problem requires Ω(ε−2) FO calls in the worst case, when the dimension d is large [64].
  • This lower bound is tightly matched by the projected subgradient method (PGD).
  • PGD uses one PO call after every FO call, resulting in a PO calls
  • Results:

    The authors present the main results . The authors first present the main ideas in Section 3.1 and the results for PO and LMO settings in Sections 3.2 and 3.3 respectively. 3.1 Main Ideas

    The authors are interested in the NSCO problem (1).
  • In Figure 2 the authors plot the mean sub-optimality gap: f − f∗, of the iterates against the number of LMO and FO calls, respectively, used to obtain that iterate.
  • In both these plots, while MOPES/MOLES and baselines have comparable FO-CC, MOPES/MOLES is significantly more efficient in the number of PO/LMO calls, matching the Theorems 1 and 2.
  • Conclusion:

    The authors study a canonical problem in optimization: minimizing a nonsmooth Lipschitz continuous convex function over a convex constraint set.
  • The authors assume that the function is accessed with a first-order oracle (FO) and the set is accessed with either a projection oracle (PO) or a linear minimization oracle (LMO).
  • The authors introduce MOLES, and show that it finds an ε-suboptimal solution with O(ε−2) FO and LMO calls
  • This is optimal in both the number of PO and the number of LMO calls.
  • This resolves a question left open since [84] on designing the optimal Frank-Wolfe type algorithm for nonsmooth functions
Tables
  • Table1: Comparison of SFO (3), PO (1) & LMO (2) calls complexities of our methods and stateof-the-art algorithms, and corresponding lower-bounds for finding an approximate minimizer of a d-dimensional NSCO problem (1). We assume that f is convex and G-Lipschitz continuous, and is accessed through a stochastic subgradient oracle with a variance of σ2. requires using a minibatch of appropriate size, †approximates projections of PGD with FW method (FW-PGD, see Appendix B.2)
  • Table2: Projection: Comparison of PO/MO and SFO calls complexities (PO-CC and SFO-CC)
  • Table3: Linear minimization oracle: LMO and SFO calls complexity (LMO-CC and SFO-CC) of various methods for d-dimensional 1 norm constrained SVM with n training samples. SFO uses a batchsize of b = o(n). SP+VR-MP combines ideas from Semi-Proximal [<a class="ref-link" id="c41" href="#r41">41</a>] and Variance reduced [<a class="ref-link" id="c16" href="#r16">16</a>] Mirror-Prox methods. Our MOLES outperforms other nonsmooth methods in LMOCC while still maintaining O(1/ε2) SFO-CC. Complexities of method based on smooth minimax reformulation adversely scale with n or d
Download tables as Excel
Related work
  • Nonsmooth convex optimization: Nonsmooth convex optimization has been the focal point of several research works for past few decades. [64] provided information theoretic lower bound of FO calls O(ε−2) to obtain ε-suboptimal solution, for the general problem. This bound is matched by the PGD method introduced independently by [34] and [59], which also implies a PO-CC of O(ε−2). Recently, several faster PGD style methods [50, 78, 87, 48] have been proposed that exploit more structure in the given optimization function, e.g., when the function is a sum of a smooth and a nonsmooth function for which a proximal operator is available [8]. But, to the best of our knowledge, such works do not explicitly address PO-CC and are mainly concerned about optimizing FO-CC. Thus, for the worst case nonsmooth functions, these methods still suffer from O(ε−2) PO-CC.

    Smoothed surrogates: Smoothing of the nonsmooth function is another common approach in solving them [62, 66]. In particular, randomized smoothing [27, 9] techniques have been successful in bringing down FO-CC w.r.t. ε but such improvements come at the cost of dimension factors. For example, [27, Corollary 2.4] provides a randomized smoothing method that has O(d1/4/ε) PO-CC and O(ε−2) FO-CC. Our MOPES method guarantees significantly better PO-CC than PGD that is still independent of dimension.
Reference
  • J.-B. Alayrac, P. Bojanowski, N. Agrawal, J. Sivic, I. Laptev, and S. Lacoste-Julien. Unsupervised learning from narrated instruction videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4575–4583, 2016.
    Google ScholarLocate open access versionFindings
  • B. Amos, L. Xu, and J. Z. Kolter. Input convex neural networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 146–155. JMLR. org, 2017.
    Google ScholarLocate open access versionFindings
  • F. Bach. Duality between subgradient and conditional gradient methods. SIAM Journal on Optimization, 25(1):115–129, 2015.
    Google ScholarLocate open access versionFindings
  • F. Bach, R. Jenatton, J. Mairal, G. Obozinski, et al. Optimization with sparsity-inducing penalties. Foundations and Trends® in Machine Learning, 4(1):1–106, 2012.
    Google ScholarLocate open access versionFindings
  • K. Balasubramanian and S. Ghadimi. Zeroth-order (non)-convex stochastic optimization via conditional gradient and gradient updates. In Advances in Neural Information Processing Systems, pages 3455–3464, 2018.
    Google ScholarLocate open access versionFindings
  • N. Bansal and A. Gupta. Potential-function proofs for first-order methods. arXiv preprint arXiv:1712.04581, 2017.
    Findings
  • H. H. Bauschke, M. N. Dao, and S. B. Lindstrom. Regularizing with bregman–moreau envelopes. SIAM Journal on Optimization, 28(4):3208–3228, 2018.
    Google ScholarLocate open access versionFindings
  • A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences, 2(1):183–202, 2009.
    Google ScholarLocate open access versionFindings
  • A. Beck and M. Teboulle. Smoothing and first order methods: A unified framework. SIAM Journal on Optimization, 22(2):557–580, 2012.
    Google ScholarLocate open access versionFindings
  • A. Ben-Tal, S. Bhadra, C. Bhattacharyya, and A. Nemirovski. Efficient methods for robust classification under uncertainty in kernel matrices. Journal of Machine Learning Research, 13 (Oct):2923–2954, 2012.
    Google ScholarLocate open access versionFindings
  • D. P. Bertsekas. Nonlinear Programming. Athena Scientific Belmont, 2 edition, 1999.
    Google ScholarFindings
  • C. M. Bishop. Pattern recognition and machine learning. springer, 2006.
    Google ScholarFindings
  • P. S. Bradley and O. L. Mangasarian. Feature selection via concave minimization and support vector machines. In ICML, volume 98, pages 82–90, 1998.
    Google ScholarLocate open access versionFindings
  • G. Braun, S. Pokutta, and D. Zink. Lazifying conditional gradient algorithms. Journal of Machine Learning Research, 20(71):1–42, 2019.
    Google ScholarLocate open access versionFindings
  • J.-F. Cai, E. J. Candès, and Z. Shen. A singular value thresholding algorithm for matrix completion. SIAM Journal on optimization, 20(4):1956–1982, 2010.
    Google ScholarLocate open access versionFindings
  • Y. Carmon, Y. Jin, A. Sidford, and K. Tian. Variance reduction for matrix games. In Advances in Neural Information Processing Systems, pages 11377–11388, 2019.
    Google ScholarLocate open access versionFindings
  • J. Chen, T. Yang, Q. Lin, L. Zhang, and Y. Chang. Optimal stochastic strongly convex optimization with a logarithmic number of projections. arXiv preprint arXiv:1304.5504, 2013.
    Findings
  • L. Chen, C. Harshaw, H. Hassani, and A. Karbasi. Projection-free online optimization with stochastic gradient: From convexity to submodularity. In International Conference on Machine Learning, pages 814–823, 2018.
    Google ScholarLocate open access versionFindings
  • Y. Chen, Y. Shi, and B. Zhang. Optimal control via neural networks: A convex approach. In International Conference on Learning Representations, 2018.
    Google ScholarLocate open access versionFindings
  • Y. Chen, Y. Shi, and B. Zhang. Input convex neural networks for optimal voltage regulation. arXiv preprint arXiv:2002.08684, 2020.
    Findings
  • A. Clark and Contributors. Pillow: Python image-processing library, 2020. URL https://pillow.readthedocs.io/en/stable/. Documentation.
    Findings
  • K. L. Clarkson. Coresets, sparse greedy approximation, and the frank-wolfe algorithm. ACM Transactions on Algorithms (TALG), 6(4):1–30, 2010.
    Google ScholarLocate open access versionFindings
  • B. Cox, A. Juditsky, and A. Nemirovski. Decomposition techniques for bilinear saddle point problems and variational inequalities with affine monotone operators. Journal of Optimization Theory and Applications, 172(2):402–435, 2017.
    Google ScholarLocate open access versionFindings
  • J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255.
    Google ScholarLocate open access versionFindings
  • O. Devolder, F. Glineur, and Y. Nesterov. First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming, 146(1-2):37–75, 2014.
    Google ScholarLocate open access versionFindings
  • J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra. Efficient projections onto the l 1-ball for learning in high dimensions. In Proceedings of the 25th international conference on Machine learning, pages 272–279, 2008.
    Google ScholarLocate open access versionFindings
  • J. C. Duchi, P. L. Bartlett, and M. J. Wainwright. Randomized smoothing for stochastic optimization. SIAM Journal on Optimization, 22(2):674–701, 2012.
    Google ScholarLocate open access versionFindings
  • M. Frank and P. Wolfe. An algorithm for quadratic programming. Naval research logistics quarterly, 3(1-2):95–110, 1956.
    Google ScholarLocate open access versionFindings
  • R. M. Freund and P. Grigas. New analysis and results for the frank–wolfe method. Mathematical Programming, 155(1-2):199–230, 2016.
    Google ScholarLocate open access versionFindings
  • D. Garber and E. Hazan. A linearly convergent conditional gradient algorithm with applications to online and stochastic optimization. arXiv preprint arXiv:1301.4666, 2013.
    Findings
  • D. Garber and E. Hazan. Faster rates for the frank-wolfe method over strongly-convex sets. In 32nd International Conference on Machine Learning, ICML 2015, 2015.
    Google ScholarLocate open access versionFindings
  • G. Gidel, T. Jebara, and S. Lacoste-Julien. Frank-wolfe algorithms for saddle point problems. In Artificial Intelligence and Statistics, pages 362–371. PMLR, 2017.
    Google ScholarLocate open access versionFindings
  • G. Gidel, F. Pedregosa, and S. Lacoste-Julien. Frank-wolfe splitting via augmented lagrangian method. In International Conference on Artificial Intelligence and Statistics, pages 1456–1465, 2018.
    Google ScholarLocate open access versionFindings
  • A. A. Goldstein. Convex programming in hilbert space. Bulletin of the American Mathematical Society, 70(5):709–710, 1964.
    Google ScholarLocate open access versionFindings
  • J. H. Hammond. Solving asymmetric variational inequality problems and systems of equations with generalized nonlinear programming algorithms. PhD thesis, Massachusetts Institute of Technology, 1984.
    Google ScholarFindings
  • Z. Harchaoui, A. Juditsky, and A. Nemirovski. Conditional gradient algorithms for normregularized smooth convex optimization. Mathematical Programming, 152(1-2):75–112, 2015.
    Google ScholarLocate open access versionFindings
  • H. Hassani, A. Karbasi, A. Mokhtari, and Z. Shen. Stochastic conditional gradient++: (non-)convex minimization and continuous submodular maximization. arXiv preprint arXiv:1902.06992, 2019.
    Findings
  • E. Hazan and S. Kale. Projection-free online learning. In Proceedings of the 29th International Coference on International Conference on Machine Learning, pages 1843–1850, 2012.
    Google ScholarLocate open access versionFindings
  • E. Hazan and H. Luo. Variance-reduced and projection-free stochastic optimization. In International Conference on Machine Learning, pages 1263–1271, 2016.
    Google ScholarLocate open access versionFindings
  • E. Hazan and E. Minasyan. Faster projection-free online learning. arXiv preprint arXiv:2001.11568, 2020.
    Findings
  • N. He and Z. Harchaoui. Semi-proximal mirror-prox for nonsmooth composite minimization. In Advances in Neural Information Processing Systems, pages 3411–3419, 2015.
    Google ScholarLocate open access versionFindings
  • N. He and Z. Harchaoui. Stochastic semi-proximal mirror-prox. Workshop on Optimization for Machine Learning, 2015. URL https://opt-ml.org/papers/OPT2015_paper_27.pdf.
    Locate open access versionFindings
  • J. Howard. Imagenette, 2019. URL https://github.com/fastai/imagenette. Github repository with links to dataset.
    Findings
  • P. J. Huber. Robust statistical procedures, volume 68. SIAM, 1996.
    Google ScholarLocate open access versionFindings
  • M. Jaggi. Revisiting frank-wolfe: Projection-free sparse convex optimization. In Proceedings of the 30th international conference on machine learning, pages 427–435, 2013.
    Google ScholarLocate open access versionFindings
  • P. Jain, O. D. Thakkar, and A. Thakurta. Differentially private matrix completion revisited. In International Conference on Machine Learning, pages 2215–2224. PMLR, 2018.
    Google ScholarLocate open access versionFindings
  • B. Kulis, M. A. Sustik, and I. S. Dhillon. Low-rank kernel learning with bregman matrix divergences. Journal of Machine Learning Research, 10(Feb):341–376, 2009.
    Google ScholarLocate open access versionFindings
  • A. Kundu, F. Bach, and C. Bhattacharya. Convex optimization over intersection of simple sets: improved convergence rate guarantees via an exact penalty approach. In International Conference on Artificial Intelligence and Statistics, pages 958–967. PMLR, 2018.
    Google ScholarLocate open access versionFindings
  • S. Lacoste-Julien. Convergence rate of frank-wolfe for non-convex objectives. arXiv preprint arXiv:1607.00345, 2016.
    Findings
  • S. Lacoste-Julien, M. Schmidt, and F. Bach. A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method. arXiv preprint arXiv:1212.2002, 2012.
    Findings
  • S. Lacoste-Julien, M. Jaggi, M. Schmidt, and P. Pletscher. Block-coordinate frank-wolfe optimization for structural svms. In Proceedings of the 30th international conference on machine learning, pages 53–61, 2013.
    Google ScholarLocate open access versionFindings
  • J. Lafond, H.-T. Wai, and E. Moulines. On the online frank-wolfe algorithms for convex and non-convex optimizations. arXiv preprint arXiv:1510.01171, 2015.
    Findings
  • G. Lan. An optimal method for stochastic composite optimization. Mathematical Programming, 133(1-2):365–397, 2012.
    Google ScholarLocate open access versionFindings
  • G. Lan. The complexity of large-scale convex programming under a linear optimization oracle. arXiv preprint arXiv:1309.5550, 2013.
    Findings
  • G. Lan. Gradient sliding for composite optimization. Mathematical Programming, 159(1-2): 201–235, 2016.
    Google ScholarLocate open access versionFindings
  • G. Lan and Y. Zhou. Conditional gradient sliding for convex optimization. SIAM Journal on Optimization, 26(2):1379–1409, 2016.
    Google ScholarLocate open access versionFindings
  • G. Lan, Z. Lu, and R. D. Monteiro. Primal-dual first-order methods with O(1/ε) iterationcomplexity for cone programming. Mathematical Programming, 126(1):1–29, 2011.
    Google ScholarLocate open access versionFindings
  • G. Lan, S. Pokutta, Y. Zhou, and D. Zink. Conditional accelerated lazy stochastic gradient descent. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1965–1974, 2017.
    Google ScholarLocate open access versionFindings
  • E. S. Levitin and B. T. Polyak. Constrained minimization methods. USSR Computational mathematics and mathematical physics, 6(5):1–50, 1966.
    Google ScholarLocate open access versionFindings
  • F. Locatello, A. Yurtsever, O. Fercoq, and V. Cevher. Stochastic frank-wolfe for composite convex minimization. In Advances in Neural Information Processing Systems, pages 14246– 14256, 2019.
    Google ScholarLocate open access versionFindings
  • M. Mahdavi, T. Yang, R. Jin, S. Zhu, and J. Yi. Stochastic gradient descent with only one projection. In Advances in Neural Information Processing Systems, pages 494–502, 2012.
    Google ScholarLocate open access versionFindings
  • J. J. Moreau. Functions convexes duales et points proximaux dans un espace hilbertien. CR Acad. Sci. Paris Ser. A Math., 255:2897–2899, 1962.
    Google ScholarLocate open access versionFindings
  • J.-J. Moreau. Proximité et dualité dans un espace hilbertien. Bulletin de la Société mathématique de France, 93:273–299, 1965.
    Google ScholarLocate open access versionFindings
  • A. S. Nemirovski and D. B. Yudin. Problem complexity and method efficiency in optimization. Wiley-Interscience, 1 edition, 1983.
    Google ScholarFindings
  • Y. Nesterov. Introductory lectures on convex programming volume I: Basic course. Lecture notes, 1998.
    Google ScholarLocate open access versionFindings
  • Y. Nesterov. Smooth minimization of non-smooth functions. Mathematical programming, 103 (1):127–152, 2005.
    Google ScholarFindings
  • Y. Nesterov. Gradient methods for minimizing composite functions. Mathematical Programming, 140(1):125–161, 2013.
    Google ScholarLocate open access versionFindings
  • Y. Nesterov. Complexity bounds for primal-dual methods minimizing the model of objective function. Mathematical Programming, 171(1-2):311–330, 2018.
    Google ScholarLocate open access versionFindings
  • Y. E. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k2). In Dokl. akad. nauk Sssr, volume 269, pages 543–547, 1983.
    Google ScholarLocate open access versionFindings
  • Q. Nguyen. Efficient learning with soft label information and multiple annotators. PhD thesis, University of Pittsburgh, 2014.
    Google ScholarFindings
  • B. Palaniappan and F. Bach. Stochastic variance reduction methods for saddle-point problems. In Advances in Neural Information Processing Systems, pages 1416–1424, 2016.
    Google ScholarLocate open access versionFindings
  • F. Pierucci, Z. Harchaoui, and J. Malick. A smoothing approach for composite conditional gradient with nonsmooth loss. Technical report, [Research Report] RR-8662, INRIA Grenoble, 2014.
    Google ScholarFindings
  • S. N. Ravi, M. D. Collins, and V. Singh. A deterministic nonsmooth frank wolfe algorithm with coreset guarantees. Informs Journal on Optimization, 1(2):120–142, 2019.
    Google ScholarLocate open access versionFindings
  • M. I. Razzak. Sparse support matrix machines for the classification of corrupted data. PhD thesis, Queensland University of Technology, 2019.
    Google ScholarFindings
  • S. J. Reddi, S. Sra, B. Póczos, and A. Smola. Stochastic frank-wolfe methods for nonconvex optimization. In 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 1244–1251. IEEE, 2016.
    Google ScholarLocate open access versionFindings
  • A. K. Sahu, M. Zaheer, and S. Kar. Towards gradient free and projection free stochastic optimization. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 3468–3477, 2019.
    Google ScholarLocate open access versionFindings
  • M. Schmidt, N. L. Roux, and F. R. Bach. Convergence rates of inexact proximal-gradient methods for convex optimization. In Advances in neural information processing systems, pages 1458–1466, 2011.
    Google ScholarLocate open access versionFindings
  • O. Shamir and T. Zhang. Stochastic gradient descent for non-smooth optimization: Convergence results and optimal averaging schemes. In International conference on machine learning, pages 71–79, 2013.
    Google ScholarLocate open access versionFindings
  • N. Srebro, J. Rennie, and T. S. Jaakkola. Maximum-margin matrix factorization. In Advances in neural information processing systems, pages 1329–1336, 2005.
    Google ScholarLocate open access versionFindings
  • K. K. Thekumparampil, P. Jain, P. Netrapalli, and S. Oh. Efficient algorithms for smooth minimax optimization. In Advances in Neural Information Processing Systems, pages 12659– 12670, 2019.
    Google ScholarLocate open access versionFindings
  • P. Tseng. Accelerated proximal gradient methods for convex optimization. Technical report, University of Washington, Seattle, 2008. URL https://www.mit.edu/~dimitrib/PTseng/papers/apgm.pdf.
    Findings
  • R. Vinter and H. Zheng. Some finance problems solved with nonsmooth optimization techniques. Journal of optimization theory and applications, 119(1):1–18, 2003.
    Google ScholarLocate open access versionFindings
  • Z. Wang, X. He, D. Gao, and X. Xue. An efficient kernel-based matrixized least squares support vector machine. Neural Computing and Applications, 22(1):143–150, 2013.
    Google ScholarLocate open access versionFindings
  • D. White. Extension of the frank-wolfe algorithm to concave nondifferentiable objective functions. Journal of optimization theory and applications, 78(2):283–301, 1993.
    Google ScholarLocate open access versionFindings
  • L. Wolf, H. Jhuang, and T. Hazan. Modeling appearances with low-rank svm. In 2007 IEEE Conference on Computer Vision and Pattern Recognition, pages 1–6. IEEE, 2007.
    Google ScholarLocate open access versionFindings
  • J. Xie, Z. Shen, C. Zhang, B. Wang, and H. Qian. Efficient projection-free online methods with stochastic recursive gradient. In AAAI, pages 6446–6453, 2020.
    Google ScholarLocate open access versionFindings
  • T. Yang and Q. Lin. RSG: Beating subgradient method without smoothness and strong convexity. The Journal of Machine Learning Research, 19(1):236–268, 2018.
    Google ScholarLocate open access versionFindings
  • T. Yang, Q. Lin, and L. Zhang. A richer theory of convex constrained optimization with reduced projections and improved rates. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 3901–3910. JMLR. org, 2017.
    Google ScholarLocate open access versionFindings
  • I. E.-H. Yen, X. Lin, J. Zhang, P. Ravikumar, and I. Dhillon. A convex atomic-norm approach to multiple sequence alignment and motif discovery. In International Conference on Machine Learning, pages 2272–2280, 2016.
    Google ScholarLocate open access versionFindings
  • K. Yosida. Functional analysis. Springer Verlag, 1965.
    Google ScholarFindings
  • L. Zhang, T. Yang, R. Jin, and X. He. O(log t) projections for stochastic optimization of smooth and strongly convex functions. In International Conference on Machine Learning, pages 1121–1129, 2013.
    Google ScholarLocate open access versionFindings
  • T. Zhang. Sequential greedy approximation for certain convex optimization problems. IEEE Transactions on Information Theory, 49(3):682–691, 2003.
    Google ScholarLocate open access versionFindings
  • J. Zhu, S. Rosset, R. Tibshirani, and T. J. Hastie. 1-norm support vector machines. In Advances in neural information processing systems, pages 49–56, 2004.
    Google ScholarLocate open access versionFindings
  • [27] Nonsmooth methods (p = 1) Mirror descent (p = 1)
    Google ScholarFindings
  • [64] Randomized smoothing (p = 1)
    Google ScholarFindings
  • [27] Minimax methods: O(n) extra memory
    Google ScholarFindings
  • [16] Mirror-Prox methods. Here SP+VR-MP uses the variance reduced Mirror-prox method [16] in the 2- 2 setting to optimize (84) and then approximates the projection steps with Frank-Wolfe (FW) method. This is an L22-smooth minimax problem with
    Google ScholarFindings
  • [54] Nonsmooth methods (p = 1) Rand. Frank-Wolfe (p = 1)
    Google ScholarFindings
  • [54] Minimax methods: O(n) extra memory SP [41]+VR [16]-MP (p = 2)
    Google ScholarFindings
  • [16] Mirror-Prox methods. Our MOLES outperforms other nonsmooth methods in LMOCC while still maintaining O(1/ε2) SFO-CC. Complexities of method based on smooth minimax reformulation adversely scale with n or d.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments