What to Do When Your Discrete Optimization Is the Size of a Neural Network?
CoRR(2024)
摘要
Oftentimes, machine learning applications using neural networks involve
solving discrete optimization problems, such as in pruning,
parameter-isolation-based continual learning and training of binary networks.
Still, these discrete problems are combinatorial in nature and are also not
amenable to gradient-based optimization. Additionally, classical approaches
used in discrete settings do not scale well to large neural networks, forcing
scientists and empiricists to rely on alternative methods. Among these, two
main distinct sources of top-down information can be used to lead the model to
good solutions: (1) extrapolating gradient information from points outside of
the solution set (2) comparing evaluations between members of a subset of the
valid solutions. We take continuation path (CP) methods to represent using
purely the former and Monte Carlo (MC) methods to represent the latter, while
also noting that some hybrid methods combine the two. The main goal of this
work is to compare both approaches. For that purpose, we first overview the two
classes while also discussing some of their drawbacks analytically. Then, on
the experimental section, we compare their performance, starting with smaller
microworld experiments, which allow more fine-grained control of problem
variables, and gradually moving towards larger problems, including neural
network regression and neural network pruning for image classification, where
we additionally compare against magnitude-based pruning.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要