## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# Guaranteed Matrix Completion via Non-convex Factorization.

IEEE Symposium on Foundations of Computer Science, no. 11 (2016): 6535-6579

EI WOS

Full Text

Weibo

Abstract

Matrix factorization is a popular approach for large-scale matrix completion. The optimization formulation based on matrix factorization, even with huge size, can be solved very efficiently through the standard optimization algorithms in practice. However, due to the non-convexity caused by the factorization model, there is a limited theo...More

Code:

Data:

Introduction

- In the era of big data, there has been an increasing need for handling the enormous amount of data generated by mobile devices, sensors, online merchants, social networks, etc.
- The authors partially answer this question by showing that under similar conditions to those used in previous works, many standard optimization algorithms for a factorization based formulation (see (11)) converge to the true low-rank matrix.

Highlights

- In the era of big data, there has been an increasing need for handling the enormous amount of data generated by mobile devices, sensors, online merchants, social networks, etc
- The whole matrix is the optimization variable and the nuclear norm of this matrix variable, which can be viewed as a convex approximation of its rank, serves as the objective function or a regularization term
- 1) Our result provides a validation of the matrix factorization based formulation rather than a validation of a single algorithm
- The first BCD type algorithm we present is AltMin, which, in the context of matrix completion, usually refers to the algorithm that alternates between and by updating one factor at a time with the other factor fixed
- The fourth algorithm we present is stochastic gradient descent [1, 21] tailored for our problem (P1)
- Similar to the results for nuclear norm minimization [4,5,6,7], the probability is taken with respect to the random choice of Ω, and “with probability 99%” means that out of all possible sets Ω with a given size, 99% of them can lead to exact reconstruction by Algorithm 1-4 for sure

Results

- The authors remark that the results apply to other versions of SGD with different update orders or stepsize rules as long as they converge to stationary points and satisfy one condition of Proposition V.1.
- The main result of this paper is that Algorithms 1-4 will converge to the global optima of problem (P1) given in (11) and reconstruct exactly with high probability, provided that the number of revealed entries is large enough.
- Similar to the results for nuclear norm minimization [4,5,6,7], the probability is taken with respect to the random choice of Ω, and “with probability 99%” means that out of all possible sets Ω with a given size, 99% of them can lead to exact reconstruction by Algorithm 1-4 for sure.
- Different from all previous works on AltMin for matrix completion, the result does not require the algorithm to use independent samples in different iterations.
- To prove Theorem III.1, the authors only need to prove two lemmas which describe the properties of the problem formulation (P1) and the properties of the algorithms respectively.
- Throughout the paper, “under the same condition of Lemma III.1” means “assume is defined by (9) and Ω is uniformly generated at random with size ∣Ω∣ satisfying (17), where 0, are the same numerical constants as those in Lemma III.1”.
- Corollary III.1 Under the same conditions of Theorem III.1, any algorithm satisfying Properties (a) and (b) in Lemma III.2 reconstructs exactly with probability at least 1 − 2/ 4.
- C. Upper bound on ∥ Ω(( − )( − ) )∥ The following result states that for , defined in Table VI, (25a) holds.

Conclusion

- Property (a) in Lemma III.2 is a basic requirement for many reasonable algorithms and can be proved using classical results in optimization, so the difficulty mainly lies in how to prove Property (b).
- The second condition means that the new point +1 is the minimum of a convex tight upper bound of the original function along the direction +1 − , and holds for BCD type methods such as Algorithm 2 and Algorithm 3.

- Table1: Initialization procedure (INITIALIZE)
- Table2: Algorithm 4 (SGD)
- Table3: Algorithm 1 (Gradient descent)
- Table4: Algorithm 3 (Row BSUM)
- Table5: Definition of ,
- Table6: Algorithm 2 (Two-block Alternating Minimization)

Funding

- This work is supported in part by a a Doctoral Dissertation Fellowship from the Graduate School of the University of Minnesota

Reference

- [2] P. Chen and D. Suter, “Recovering the missing components in a large noisy low-rank matrix: Application to SFM,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 8, pp. 1051–1063, 2004.
- [3] Z. Liu and L. Vandenberghe, “Interior-point method for nuclear norm approximation with application to system identification,” SIAM Journal on Matrix Analysis and Applications, vol. 31, no. 3, pp. 1235–1256, 2009.
- [4] E. J. Candes and B. Recht, “Exact matrix completion via convex optimization,” Foundations of Computational mathematics, vol. 9, no. 6, pp. 717–772, 2009.
- [5] E. J. Candes and T. Tao, “The power of convex relaxation: Near-optimal matrix completion,” IEEE Transactions on Information Theory, vol. 56, no. 5, pp. 2053–2080, 2010.
- [6] D. Gross, “Recovering low-rank matrices from few coefficients in any basis,” IEEE Transactions on Information Theory, vol. 57, no. 3, pp. 1548–1566, 2011.
- [7] B. Recht, “A simpler approach to matrix completion,” The Journal of Machine Learning Research, vol. 12, pp. 3413–3430, 2011.
- [8] J.-F. Cai, E. J. Candes, and Z. Shen, “A singular value thresholding algorithm for matrix completion,” SIAM Journal on Optimization, vol. 20, no. 4, pp. 1956–1982, 2010.
- [11] A. Agarwal, S. Negahban, and M. J. Wainwright, “Fast global convergence of gradient methods for high-dimensional statistical recovery,” The Annals of Statistics, vol. 40, no. 5, pp. 2452–2482, 2012.
- [12] K. Hou, Z. Zhou, A. M.-C. So, and Z.-Q. Luo, “On the linear convergence of the proximal gradient method for trace norm regularization,” in Advances in Neural Information Processing Systems (NIPS), 2013, pp. 710–718.
- [15] H. Keshavan, “Efficient algorithms for collaborative filtering,” Ph.D. dissertation, Stanford University, 2012.
- [16] P. Jain, P. Netrapalli, and S. Sanghavi, “Low-rank matrix completion using alternating minimization,” in Proceedings of the forty-fifth annual ACM symposium on Theory of computing (STOC). ACM, 2013, pp. 665–674.
- [17] M. Hardt, “Understanding alternating minimization for matrix completion,” in 2014 IEEE 55th Annual Symposium on Foundations of Computer Science (FOCS). IEEE, 2014, pp. 651–660.
- [20] Z. Wen, W. Yin, and Y. Zhang, “Solving a low-rank factorization model for matrix completion by a nonlinear successive overrelaxation algorithm,” Mathematical Programming Computation, vol. 4, no. 4, pp. 333–361, 2012.
- [22] A. Paterek, “Improving regularized singular value decomposition for collaborative filtering,” in Proceedings of KDD cup and workshop, vol. 2007, 2007, pp. 5–8.
- [24] B. Recht and C. Re, “Parallel stochastic gradient algorithms for large-scale matrix completion,” Mathematical Programming Computation, vol. 5, no. 2, pp. 201–226, 2013.
- [25] Y. Zhuang, W.-S. Chin, Y.-C. Juan, and C.-J. Lin, “A fast parallel sgd for matrix factorization in shared memory systems,” in Proceedings of the 7th ACM Conference on Recommender Systems. ACM, 2013, pp. 249–256.
- [26] I. Pilaszy, D. Zibriczky, and D. Tikk, “Fast als-based matrix factorization for explicit and implicit feedback datasets,” in Proceedings of the fourth ACM conference on Recommender systems. ACM, 2010, pp. 71–78.
- [27] H.-F. Yu, C.-J. Hsieh, S. Si, and I. S. Dhillon, “Scalable coordinate descent approaches to parallel matrix factorization for recommender systems.” in ICDM, 2012, pp. 765–774.
- [28] T. Hastie, R. Mazumder, J. Lee, and R. Zadeh, “Matrix completion and low-rank svd via fast alternating least squares,” arXiv preprint arXiv:1410.2596, 2014.
- [29] R. Sun, “Matrix completion via nonconvex factorization: Algorithms and theory,” Ph.D. dissertation, University of Minnesota, 2015.
- [30] R. H. Keshavan, A. Montanari, and S. Oh, “Matrix completion from a few entries,” IEEE Transactions on Information Theory, vol. 56, no. 6, pp. 2980–2998, 2010.
- [31] E. Candes, X. Li, and M. Soltanolkotabi, “Phase retrieval via wirtinger flow: Theory and algorithms,” arXiv preprint arXiv:1407.1065, 2014.
- [32] D. Gross, Y.-K. Liu, S. T. Flammia, S. Becker, and J. Eisert, “Quantum state tomography via compressed sensing,” arXiv preprint, http://arxiv.org/abs/0909.3304v1, 2009.
- [33] P. Jain and P. Netrapalli, “Fast exact matrix completion with finite samples,” arXiv preprint arXiv:1411.1087, 2014.
- [34] C. De Sa, K. Olukotun, and C. Re, “Global convergence of stochastic gradient descent for some nonconvex matrix problems,” arXiv preprint arXiv:1411.1134, 2014.
- [35] P. Netrapalli, P. Jain, and S. Sanghavi, “Phase retrieval using alternating minimization,” in Advances in Neural Information Processing Systems (NIPS), 2013, pp. 2796–2804.
- [36] C.-H. Zhang and T. Zhang, “A general theory of concave regularization for high-dimensional sparse estimation problems,” Statistical Science, vol. 27, no. 4, pp. 576–593, 2012.
- [37] P.-L. Loh and M. Wainwright, “Regularized m-estimators with nonconvexity: Statistical and algorithmic theory for local optima,” in Advances in Neural Information Processing Systems, 2013, pp. 476–484.
- [38] J. Fan, L. Xue, and H. Zou, “Strong oracle optimality of folded concave penalized estimation,” The Annals of Statistics, vol. 42, no. 3, pp. 819–849, 2014.
- [39] X.-T. Yuan and T. Zhang, “Truncated power method for sparse eigenvalue problems,” The Journal of Machine Learning Research, vol. 14, no. 1, pp. 899–925, 2013.
- [40] Z. Wang, H. Lu, and H. Liu, “Nonconvex statistical optimization: Minimax-optimal sparse pca in polynomial time,” arXiv preprint arXiv:1408.5352, 2014.
- [41] P. Netrapalli, U. Niranjan, S. Sanghavi, A. Anandkumar, and P. Jain, “Non-convex robust pca,” in Advances in Neural Information Processing Systems, 2014, pp. 1107–1115.
- [42] S. Balakrishnan, M. Wainwright, and B. Yu, “Statistical guarantees for the EM algorithm: From population to sample-based analysis,” arXiv preprint arXiv:1408.2156, 2014.
- [43] Z. Wang, Q. Gu, Y. Ning, and H. Liu, “High dimensional expectation-maximization algorithm: Statistical optimization and asymptotic normality,” arXiv preprint arXiv:1412.8729, 2014.
- [44] U. Feige and E. Ofek, “Spectral techniques applied to sparse random graphs,” Random Structures & Algorithms, vol. 27, no. 2, pp. 251–275, 2005.
- [46] Y. Chen, S. Bhojanapalli, S. Sanghavi, and R. Ward, “Coherent matrix completion,” in Proceedings of The 31st International Conference on Machine Learning (ICML), 2014, pp. 674–682.
- [47] S. Bhojanapalli and P. Jain, “Universal matrix completion,” arXiv preprint arXiv:1402.2324, 2014.
- [48] W. I. Zangwill, “Non-linear programming via penalty functions,” Management science, vol. 13, no. 5, pp. 344–358, 1967.
- [49] D. P. Bertsekas, “Nonlinear programming,” 1999.
- [50] P. Tseng, “Convergence of a block coordinate descent method for nondifferentiable minimization,” Journal of optimization theory and applications, vol. 109, no. 3, pp. 475–494, 2001.
- [51] Y. Nesterov, “Efficiency of coordinate descent methods on huge-scale optimization problems,” SIAM Journal on Optimization, vol. 22, no. 2, pp. 341–362, 2012.
- [52] M. Razaviyayn, M. Hong, and Z.-Q. Luo, “A unified convergence analysis of block successive minimization methods for nonsmooth optimization,” SIAM Journal on Optimization, vol. 23, no. 2, pp. 1126–1153, 2013.
- [53] H. Baligh, M. Hong, W.-C. Liao, Z.-Q. Luo, M. Razaviyayn, M. Sanjabi, and R. Sun, “Cross-layer provision of future cellular networks: A WMMSE-based approach,” IEEE Signal Processing Magazine, vol. 31, no. 6, pp. 56–68, 2014.
- [54] M. Hong, R. Sun, H. Baligh, and Z.-Q. Luo, “Joint base station clustering and beamformer design for partial coordinated transmission in heterogeneous networks,” IEEE Journal on Selected Areas in Communications (JSAC), vol. 31, no. 2, pp. 226–240, February 2013.
- [56] R. Sun, Z.-Q. Luo, and Y. Ye, “On the expected convergence of randomly permuted ADMM,” arXiv preprint arXiv:1503.06387, 2015.
- [58] D. P. Bertsekas and J. N. Tsitsiklis, “Gradient convergence in gradient methods with errors,” SIAM Journal on Optimization, vol. 10, no. 3, pp. 627–642, 2000.
- [59] G. W. Stewart, “Perturbation theory for the singular value decomposition,” 1998.

Tags

Comments

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn