Memory-Efficient Gradient Unrolling for Large-Scale Bi-level Optimization

Qianli Shen,Yezhen Wang,Zhouhao Yang,Xiang Li,Haonan Wang,Yang Zhang,Jonathan Scarlett,Zhanxing Zhu,Kenji Kawaguchi

NeurIPS 2024（2024）

PhD student

Cited 3|Views39

Abstract

Bi-level optimizaiton (BO) has become a fundamental mathematical framework for addressing hierarchical machine learning problems. As deep learning models continue to grow in size, the demand for scalable bi-level optimization has become increasingly critical. Traditional gradient-based bi-level optimizaiton algorithms, due to their inherent characteristics, are ill-suited to meet the demands of large-scale applications. In this paper, we introduce **F**orward **G**radient **U**nrolling with **F**orward **G**radient, abbreviated as **$($FG$)^2$U**, which achieves an unbiased stochastic approximation of the meta gradient for bi-level optimizaiton. $($FG$)^2$U circumvents the memory and approximation issues associated with classical bi-level optimizaiton approaches, and delivers significantly more accurate gradient estimates than existing large-scale bi-level optimizaiton approaches. Additionally, $($FG$)^2$U is inherently designed to support parallel computing, enabling it to effectively leverage large-scale distributed computing systems to achieve significant computational efficiency. In practice, $($FG$)^2$U and other methods can be strategically placed at different stages of the training process to achieve a more cost-effective two-phase paradigm. Further, $($FG$)^2$U is easy to implement within popular deep learning frameworks, and can be conveniently adapted to address more challenging zeroth-order bi-level optimizaiton scenarios. We provide a thorough convergence analysis and a comprehensive practical discussion for $($FG$)^2$U, complemented by extensive empirical evaluations, showcasing its superior performance in diverse large-scale bi-level optimizaiton tasks.

Translated text

Key words

bi-level optimization,large-scale optimization,meta learning

Bibtex

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Data Disclaimer

The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn

Chat Paper

Summary is being generated by the instructions you defined