# Suvrit Sra

Associate Professor

Sign in to view more

Suvrit’s research bridges a variety of mathematical topics including optimization, matrix theory, differential geometry, and probability with machine learning. His recent work focuses on the foundations of geometric optimization, an emerging subarea of nonconvex optimization where geometry (often non-Euclidean) enables efficient computation of global optimality. More broadly, his work encompasses a wide range of topics in optimization, especially in machine learning, statistics, signal processing, and related areas. He is pursuing novel applications of machine learning and optimization to materials science, quantum chemistry, synthetic biology, healthcare, and other data-driven domains.

His work has won several awards at machine learning conferences, the 2011 “SIAM Outstanding Paper” award, faculty research awards from Criteo and Amazon, and an NSF CAREER award. In addition, Suvrit founded (and regularly co-chairs) the popular OPT “Optimization for Machine Learning” series of Workshops at the Conference on Neural Information Processing Systems (NIPS). He has also edited a well-received book with the same title (MIT Press, 2011).

Suvrit has devoted significant effort to teaching, as well. He has been an invited lecturer on optimization at the Machine Learning Summer School (MLSS) and numerous other specialized short courses. He revamped the Berkeley graduate course, Introduction to Convex Optimization, developed a new advanced course on optimization at CMU, and has regularly co-taught the graduate and undergraduate machine learning courses in EECS at MIT.

## Papers181 papers

Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity

Escaping Saddle Points with Adaptive Gradient Methods.

Learning Determinantal Point Processes by Corrective Negative Sampling

Analysis of Gradient Clipping and Adaptive Scaling with a Relaxed Smoothness Condition.

Conditional Gradient Methods via Stochastic Path-Integrated Differential Estimator

Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity.

**1**|EI|Bibtex

Acceleration in First Order Quasi-strongly Convex Optimization by ODE Discretization

Flexible Modeling of Diversity with Strongly Log-Concave Distributions.

Random Shuffling Beats SGD after Finite Epochs

Are deep ResNets provably better than linear predictors

**1**|EI|Bibtex

Distributional Adversarial Networks.

Learning Determinantal Point Processes by Sampling Inferred Negatives.

A Critical View of Global Optimality in Deep Learning.

Non-Linear Temporal Subspace Representations for Activity Recognition.

Direct Runge-Kutta Discretization Achieves Acceleration.

An Estimate Sequence for Geodesically Convex Optimization.

Towards Riemannian Accelerated Gradient Methods.

On Geodesically Convex Formulations for the Brascamp-Lieb Constant.

Finite sample expressive power of small-width ReLU networks.

R-SPIDER: A Fast Riemannian Stochastic Optimization Algorithm with Curvature Independent Rate.

Exponentiated Strongly Rayleigh Distributions.

Deep-RBF Networks Revisited: Robust Classification with Rejection.

New concavity and convexity results for symmetric polynomials and their ratios

Random Shuffling Beats SGD after Finite Epochs

Global Optimality Conditions for Deep Neural Networks

A Generic Approach for Escaping Saddle points

Modular proximal optimization for multidimensional total-variation regularization

Riemannian Dictionary Learning and Sparse Coding for Positive Definite Matrices

Combinatorial Topic Models using Small-Variance Asymptotics.

Unsupervised robust nonparametric learning of hidden community properties.

Elementary Symmetric Polynomials for Optimal Experimental Design.

Polynomial time algorithms for dual volume sampling.

An Alternative to EM for Gaussian Mixture Models: Batch and Stochastic Riemannian Optimization.

Elementary Symmetric Polynomials for Optimal Experimental Design.

Riemannian Frank-Wolfe with application to the geometric mean of positive definite matrices

Elementary Symmetric Polynomials for Optimal Experimental Design

The sum of squared logarithms inequality in arbitrary dimensions

Efficient Sampling for k-Determinantal Point Processes

Fast Incremental Method for Nonconvex Optimization.

First-order Methods for Geodesically Convex Optimization.

On inequalities for normalized Schur functions

Kronecker Determinantal Point Processes.

Stochastic Frank-Wolfe Methods for Nonconvex Optimization.

Fast Stochastic Methods for Nonsmooth Nonconvex Optimization.

Fast stochastic optimization on Riemannian manifolds.

Entropic metric alignment for correspondence problems.

Parallel and Distributed Block-Coordinate Frank-Wolfe Algorithms.

Gaussian quadrature for matrix inverse forms with applications.

AdaDelay: Delay Adaptive Distributed Stochastic Optimization.

Fast incremental method for smooth nonconvex optimization.

Proximal Stochastic Methods for Nonsmooth Nonconvex Finite-Sum Optimization.

Riemannian SVRG: Fast Stochastic Optimization on Riemannian Manifolds.

Fast Mixing Markov Chains for Strongly Rayleigh Measures, DPPs, and Constrained Sampling.

A proof of Thompson’S determinantal inequality

Efficient Sampling for k-Determinantal Point Processes

On the Matrix Square Root via Geometric Optimization

Conic Geometric Optimization on the Manifold of Positive Definite Matrices.

On inequalities for normalized Schur functions

Efficient Structured Matrix Rank Minimization.

Fixed-point algorithms for learning determinantal point processes

On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants

Matrix Manifold Optimization for Gaussian Mixtures

Diversity Networks: Neural Network Compression Using Determinantal Point Processes

Logarithmic inequalities under an elementary symmetric polynomial dominance order

Towards an optimal stochastic alternating direction method of multipliers.

Randomized Nonlinear Component Analysis.

Efficient nearest neighbors via robust sparse hashing.

Statistical inference with the Elliptical Gamma Distribution

**6**|Bibtex

Inference and Mixture Modeling with the Elliptical Gamma Distribution

Large-scale randomized-coordinate descent methods with non-separable linear constraints

Hlawka–Popoviciu inequalities on positive definite tensors

Fast Newton methods for the group fused lasso.

Geometric optimisation on positive definite matrices for elliptically contoured distributions.

A non-monotonic method for large-scale non-negative least squares

Reflection methods for user-friendly submodular optimization.