Interplay between depth and width for interpolation in neural ODEs
CoRR(2024)
摘要
Neural ordinary differential equations (neural ODEs) have emerged as a
natural tool for supervised learning from a control perspective, yet a complete
understanding of their optimal architecture remains elusive. In this work, we
examine the interplay between their width p and number of layer transitions
L (effectively the depth L+1). Specifically, we assess the model
expressivity in terms of its capacity to interpolate either a finite dataset
D comprising N pairs of points or two probability measures in
ℝ^d within a Wasserstein error margin ε>0. Our findings
reveal a balancing trade-off between p and L, with L scaling as
O(1+N/p) for dataset interpolation, and
L=O(1+(pε^d)^-1) for measure interpolation.
In the autonomous case, where L=0, a separate study is required, which we
undertake focusing on dataset interpolation. We address the relaxed problem of
ε-approximate controllability and establish an error decay of
ε∼ O(log(p)p^-1/d). This decay rate is a consequence of
applying a universal approximation theorem to a custom-built Lipschitz vector
field that interpolates D. In the high-dimensional setting, we further
demonstrate that p=O(N) neurons are likely sufficient to achieve exact
control.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要