Efficient Pruning of Large Language Model with Adaptive Estimation Fusion
CoRR(2024)
Abstract
Large language models (LLMs) have become crucial for many generative
downstream tasks, leading to an inevitable trend and significant challenge to
deploy them efficiently on resource-constrained devices. Structured pruning is
a widely used method to address this challenge. However, when dealing with the
complex structure of the multiple decoder layers, general methods often employ
common estimation approaches for pruning. These approaches lead to a decline in
accuracy for specific downstream tasks. In this paper, we introduce a simple
yet efficient method that adaptively models the importance of each
substructure. Meanwhile, it can adaptively fuse coarse-grained and finegrained
estimations based on the results from complex and multilayer structures. All
aspects of our design seamlessly integrate into the endto-end pruning
framework. Our experimental results, compared with state-of-the-art methods on
mainstream datasets, demonstrate average accuracy improvements of 1.1
2.0
respectively.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined