Tractability from overparametrization: the example of the negative perceptron

Probability Theory and Related Fields(2024)

Cited 1|Views5
No score
Abstract
In the negative perceptron problem we are given n data points (x_i,y_i) , where x_i is a d -dimensional vector and y_i∈{+1,-1} is a binary label. The data are not linearly separable and hence we content ourselves to find a linear classifier with the largest possible negative margin. In other words, we want to find a unit norm vector θ that maximizes min _i≤ ny_i⟨θ,x_i⟩ . This is a non-convex optimization problem (it is equivalent to finding a maximum norm vector in a polytope), and we study its typical properties under two random models for the data. We consider the proportional asymptotics in which n,d→∞ with n/d→δ , and prove upper and lower bounds on the maximum margin κ _s(δ ) or—equivalently—on its inverse function δ _s(κ ) . In other words, δ _s(κ ) is the overparametrization threshold: for n/d≤δ _s(κ )-ε a classifier achieving vanishing training error exists with high probability, while for n/d≥δ _s(κ )+ε it does not. Our bounds on δ _s(κ ) match to the leading order as κ→ -∞ . We then analyze a linear programming algorithm to find a solution, and characterize the corresponding threshold δ _lin(κ ) . We observe a gap between the interpolation threshold δ _s(κ ) and the linear programming threshold δ _lin(κ ) , raising the question of the behavior of other algorithms.
More
Translated text
Key words
60D05 Geometric probability and stochastic geometry,68T07 Artificial neural networks and deep learning,82B44 Disordered systems (random Ising models random Schrödinger operators etc.) in equilibrium statistical mechanics
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined