AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
The fourth algorithm we present is stochastic gradient descent tailored for our problem

Guaranteed Matrix Completion via Non-convex Factorization.

IEEE Symposium on Foundations of Computer Science, no. 11 (2016): 6535-6579

Cited by: 402|Views298
EI WOS

Abstract

Matrix factorization is a popular approach for large-scale matrix completion. The optimization formulation based on matrix factorization, even with huge size, can be solved very efficiently through the standard optimization algorithms in practice. However, due to the non-convexity caused by the factorization model, there is a limited theo...More

Code:

Data:

Introduction
  • In the era of big data, there has been an increasing need for handling the enormous amount of data generated by mobile devices, sensors, online merchants, social networks, etc.
  • The authors partially answer this question by showing that under similar conditions to those used in previous works, many standard optimization algorithms for a factorization based formulation (see (11)) converge to the true low-rank matrix.
Highlights
  • In the era of big data, there has been an increasing need for handling the enormous amount of data generated by mobile devices, sensors, online merchants, social networks, etc
  • The whole matrix is the optimization variable and the nuclear norm of this matrix variable, which can be viewed as a convex approximation of its rank, serves as the objective function or a regularization term
  • 1) Our result provides a validation of the matrix factorization based formulation rather than a validation of a single algorithm
  • The first BCD type algorithm we present is AltMin, which, in the context of matrix completion, usually refers to the algorithm that alternates between and by updating one factor at a time with the other factor fixed
  • The fourth algorithm we present is stochastic gradient descent [1, 21] tailored for our problem (P1)
  • Similar to the results for nuclear norm minimization [4,5,6,7], the probability is taken with respect to the random choice of Ω, and “with probability 99%” means that out of all possible sets Ω with a given size, 99% of them can lead to exact reconstruction by Algorithm 1-4 for sure
Results
  • The authors remark that the results apply to other versions of SGD with different update orders or stepsize rules as long as they converge to stationary points and satisfy one condition of Proposition V.1.
  • The main result of this paper is that Algorithms 1-4 will converge to the global optima of problem (P1) given in (11) and reconstruct exactly with high probability, provided that the number of revealed entries is large enough.
  • Similar to the results for nuclear norm minimization [4,5,6,7], the probability is taken with respect to the random choice of Ω, and “with probability 99%” means that out of all possible sets Ω with a given size, 99% of them can lead to exact reconstruction by Algorithm 1-4 for sure.
  • Different from all previous works on AltMin for matrix completion, the result does not require the algorithm to use independent samples in different iterations.
  • To prove Theorem III.1, the authors only need to prove two lemmas which describe the properties of the problem formulation (P1) and the properties of the algorithms respectively.
  • Throughout the paper, “under the same condition of Lemma III.1” means “assume is defined by (9) and Ω is uniformly generated at random with size ∣Ω∣ satisfying (17), where 0, are the same numerical constants as those in Lemma III.1”.
  • Corollary III.1 Under the same conditions of Theorem III.1, any algorithm satisfying Properties (a) and (b) in Lemma III.2 reconstructs exactly with probability at least 1 − 2/ 4.
  • C. Upper bound on ∥ Ω(( − )( − ) )∥ The following result states that for , defined in Table VI, (25a) holds.
Conclusion
  • Property (a) in Lemma III.2 is a basic requirement for many reasonable algorithms and can be proved using classical results in optimization, so the difficulty mainly lies in how to prove Property (b).
  • The second condition means that the new point +1 is the minimum of a convex tight upper bound of the original function along the direction +1 − , and holds for BCD type methods such as Algorithm 2 and Algorithm 3.
Tables
  • Table1: Initialization procedure (INITIALIZE)
  • Table2: Algorithm 4 (SGD)
  • Table3: Algorithm 1 (Gradient descent)
  • Table4: Algorithm 3 (Row BSUM)
  • Table5: Definition of ,
  • Table6: Algorithm 2 (Two-block Alternating Minimization)
Download tables as Excel
Funding
  • This work is supported in part by a a Doctoral Dissertation Fellowship from the Graduate School of the University of Minnesota
Reference
  • [2] P. Chen and D. Suter, “Recovering the missing components in a large noisy low-rank matrix: Application to SFM,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 8, pp. 1051–1063, 2004.
    Google ScholarLocate open access versionFindings
  • [3] Z. Liu and L. Vandenberghe, “Interior-point method for nuclear norm approximation with application to system identification,” SIAM Journal on Matrix Analysis and Applications, vol. 31, no. 3, pp. 1235–1256, 2009.
    Google ScholarLocate open access versionFindings
  • [4] E. J. Candes and B. Recht, “Exact matrix completion via convex optimization,” Foundations of Computational mathematics, vol. 9, no. 6, pp. 717–772, 2009.
    Google ScholarLocate open access versionFindings
  • [5] E. J. Candes and T. Tao, “The power of convex relaxation: Near-optimal matrix completion,” IEEE Transactions on Information Theory, vol. 56, no. 5, pp. 2053–2080, 2010.
    Google ScholarLocate open access versionFindings
  • [6] D. Gross, “Recovering low-rank matrices from few coefficients in any basis,” IEEE Transactions on Information Theory, vol. 57, no. 3, pp. 1548–1566, 2011.
    Google ScholarLocate open access versionFindings
  • [7] B. Recht, “A simpler approach to matrix completion,” The Journal of Machine Learning Research, vol. 12, pp. 3413–3430, 2011.
    Google ScholarLocate open access versionFindings
  • [8] J.-F. Cai, E. J. Candes, and Z. Shen, “A singular value thresholding algorithm for matrix completion,” SIAM Journal on Optimization, vol. 20, no. 4, pp. 1956–1982, 2010.
    Google ScholarLocate open access versionFindings
  • [11] A. Agarwal, S. Negahban, and M. J. Wainwright, “Fast global convergence of gradient methods for high-dimensional statistical recovery,” The Annals of Statistics, vol. 40, no. 5, pp. 2452–2482, 2012.
    Google ScholarLocate open access versionFindings
  • [12] K. Hou, Z. Zhou, A. M.-C. So, and Z.-Q. Luo, “On the linear convergence of the proximal gradient method for trace norm regularization,” in Advances in Neural Information Processing Systems (NIPS), 2013, pp. 710–718.
    Google ScholarLocate open access versionFindings
  • [15] H. Keshavan, “Efficient algorithms for collaborative filtering,” Ph.D. dissertation, Stanford University, 2012.
    Google ScholarFindings
  • [16] P. Jain, P. Netrapalli, and S. Sanghavi, “Low-rank matrix completion using alternating minimization,” in Proceedings of the forty-fifth annual ACM symposium on Theory of computing (STOC). ACM, 2013, pp. 665–674.
    Google ScholarLocate open access versionFindings
  • [17] M. Hardt, “Understanding alternating minimization for matrix completion,” in 2014 IEEE 55th Annual Symposium on Foundations of Computer Science (FOCS). IEEE, 2014, pp. 651–660.
    Google ScholarLocate open access versionFindings
  • [20] Z. Wen, W. Yin, and Y. Zhang, “Solving a low-rank factorization model for matrix completion by a nonlinear successive overrelaxation algorithm,” Mathematical Programming Computation, vol. 4, no. 4, pp. 333–361, 2012.
    Google ScholarLocate open access versionFindings
  • [22] A. Paterek, “Improving regularized singular value decomposition for collaborative filtering,” in Proceedings of KDD cup and workshop, vol. 2007, 2007, pp. 5–8.
    Google ScholarLocate open access versionFindings
  • [24] B. Recht and C. Re, “Parallel stochastic gradient algorithms for large-scale matrix completion,” Mathematical Programming Computation, vol. 5, no. 2, pp. 201–226, 2013.
    Google ScholarLocate open access versionFindings
  • [25] Y. Zhuang, W.-S. Chin, Y.-C. Juan, and C.-J. Lin, “A fast parallel sgd for matrix factorization in shared memory systems,” in Proceedings of the 7th ACM Conference on Recommender Systems. ACM, 2013, pp. 249–256.
    Google ScholarLocate open access versionFindings
  • [26] I. Pilaszy, D. Zibriczky, and D. Tikk, “Fast als-based matrix factorization for explicit and implicit feedback datasets,” in Proceedings of the fourth ACM conference on Recommender systems. ACM, 2010, pp. 71–78.
    Google ScholarLocate open access versionFindings
  • [27] H.-F. Yu, C.-J. Hsieh, S. Si, and I. S. Dhillon, “Scalable coordinate descent approaches to parallel matrix factorization for recommender systems.” in ICDM, 2012, pp. 765–774.
    Google ScholarFindings
  • [28] T. Hastie, R. Mazumder, J. Lee, and R. Zadeh, “Matrix completion and low-rank svd via fast alternating least squares,” arXiv preprint arXiv:1410.2596, 2014.
    Findings
  • [29] R. Sun, “Matrix completion via nonconvex factorization: Algorithms and theory,” Ph.D. dissertation, University of Minnesota, 2015.
    Google ScholarFindings
  • [30] R. H. Keshavan, A. Montanari, and S. Oh, “Matrix completion from a few entries,” IEEE Transactions on Information Theory, vol. 56, no. 6, pp. 2980–2998, 2010.
    Google ScholarLocate open access versionFindings
  • [31] E. Candes, X. Li, and M. Soltanolkotabi, “Phase retrieval via wirtinger flow: Theory and algorithms,” arXiv preprint arXiv:1407.1065, 2014.
    Findings
  • [32] D. Gross, Y.-K. Liu, S. T. Flammia, S. Becker, and J. Eisert, “Quantum state tomography via compressed sensing,” arXiv preprint, http://arxiv.org/abs/0909.3304v1, 2009.
    Findings
  • [33] P. Jain and P. Netrapalli, “Fast exact matrix completion with finite samples,” arXiv preprint arXiv:1411.1087, 2014.
    Findings
  • [34] C. De Sa, K. Olukotun, and C. Re, “Global convergence of stochastic gradient descent for some nonconvex matrix problems,” arXiv preprint arXiv:1411.1134, 2014.
    Findings
  • [35] P. Netrapalli, P. Jain, and S. Sanghavi, “Phase retrieval using alternating minimization,” in Advances in Neural Information Processing Systems (NIPS), 2013, pp. 2796–2804.
    Google ScholarLocate open access versionFindings
  • [36] C.-H. Zhang and T. Zhang, “A general theory of concave regularization for high-dimensional sparse estimation problems,” Statistical Science, vol. 27, no. 4, pp. 576–593, 2012.
    Google ScholarLocate open access versionFindings
  • [37] P.-L. Loh and M. Wainwright, “Regularized m-estimators with nonconvexity: Statistical and algorithmic theory for local optima,” in Advances in Neural Information Processing Systems, 2013, pp. 476–484.
    Google ScholarLocate open access versionFindings
  • [38] J. Fan, L. Xue, and H. Zou, “Strong oracle optimality of folded concave penalized estimation,” The Annals of Statistics, vol. 42, no. 3, pp. 819–849, 2014.
    Google ScholarLocate open access versionFindings
  • [39] X.-T. Yuan and T. Zhang, “Truncated power method for sparse eigenvalue problems,” The Journal of Machine Learning Research, vol. 14, no. 1, pp. 899–925, 2013.
    Google ScholarLocate open access versionFindings
  • [40] Z. Wang, H. Lu, and H. Liu, “Nonconvex statistical optimization: Minimax-optimal sparse pca in polynomial time,” arXiv preprint arXiv:1408.5352, 2014.
    Findings
  • [41] P. Netrapalli, U. Niranjan, S. Sanghavi, A. Anandkumar, and P. Jain, “Non-convex robust pca,” in Advances in Neural Information Processing Systems, 2014, pp. 1107–1115.
    Google ScholarLocate open access versionFindings
  • [42] S. Balakrishnan, M. Wainwright, and B. Yu, “Statistical guarantees for the EM algorithm: From population to sample-based analysis,” arXiv preprint arXiv:1408.2156, 2014.
    Findings
  • [43] Z. Wang, Q. Gu, Y. Ning, and H. Liu, “High dimensional expectation-maximization algorithm: Statistical optimization and asymptotic normality,” arXiv preprint arXiv:1412.8729, 2014.
    Findings
  • [44] U. Feige and E. Ofek, “Spectral techniques applied to sparse random graphs,” Random Structures & Algorithms, vol. 27, no. 2, pp. 251–275, 2005.
    Google ScholarLocate open access versionFindings
  • [46] Y. Chen, S. Bhojanapalli, S. Sanghavi, and R. Ward, “Coherent matrix completion,” in Proceedings of The 31st International Conference on Machine Learning (ICML), 2014, pp. 674–682.
    Google ScholarLocate open access versionFindings
  • [47] S. Bhojanapalli and P. Jain, “Universal matrix completion,” arXiv preprint arXiv:1402.2324, 2014.
    Findings
  • [48] W. I. Zangwill, “Non-linear programming via penalty functions,” Management science, vol. 13, no. 5, pp. 344–358, 1967.
    Google ScholarLocate open access versionFindings
  • [49] D. P. Bertsekas, “Nonlinear programming,” 1999.
    Google ScholarFindings
  • [50] P. Tseng, “Convergence of a block coordinate descent method for nondifferentiable minimization,” Journal of optimization theory and applications, vol. 109, no. 3, pp. 475–494, 2001.
    Google ScholarLocate open access versionFindings
  • [51] Y. Nesterov, “Efficiency of coordinate descent methods on huge-scale optimization problems,” SIAM Journal on Optimization, vol. 22, no. 2, pp. 341–362, 2012.
    Google ScholarLocate open access versionFindings
  • [52] M. Razaviyayn, M. Hong, and Z.-Q. Luo, “A unified convergence analysis of block successive minimization methods for nonsmooth optimization,” SIAM Journal on Optimization, vol. 23, no. 2, pp. 1126–1153, 2013.
    Google ScholarLocate open access versionFindings
  • [53] H. Baligh, M. Hong, W.-C. Liao, Z.-Q. Luo, M. Razaviyayn, M. Sanjabi, and R. Sun, “Cross-layer provision of future cellular networks: A WMMSE-based approach,” IEEE Signal Processing Magazine, vol. 31, no. 6, pp. 56–68, 2014.
    Google ScholarLocate open access versionFindings
  • [54] M. Hong, R. Sun, H. Baligh, and Z.-Q. Luo, “Joint base station clustering and beamformer design for partial coordinated transmission in heterogeneous networks,” IEEE Journal on Selected Areas in Communications (JSAC), vol. 31, no. 2, pp. 226–240, February 2013.
    Google ScholarLocate open access versionFindings
  • [56] R. Sun, Z.-Q. Luo, and Y. Ye, “On the expected convergence of randomly permuted ADMM,” arXiv preprint arXiv:1503.06387, 2015.
    Findings
  • [58] D. P. Bertsekas and J. N. Tsitsiklis, “Gradient convergence in gradient methods with errors,” SIAM Journal on Optimization, vol. 10, no. 3, pp. 627–642, 2000.
    Google ScholarLocate open access versionFindings
  • [59] G. W. Stewart, “Perturbation theory for the singular value decomposition,” 1998.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科