Structural Pruning of Pre-trained Language Models via Neural Architecture Search
arxiv(2024)
摘要
Pre-trained language models (PLM), for example BERT or RoBERTa, mark the
state-of-the-art for natural language understanding task when fine-tuned on
labeled data. However, their large size poses challenges in deploying them for
inference in real-world applications, due to significant GPU memory
requirements and high inference latency. This paper explores neural
architecture search (NAS) for structural pruning to find sub-parts of the
fine-tuned network that optimally trade-off efficiency, for example in terms of
model size or latency, and generalization performance. We also show how we can
utilize more recently developed two-stage weight-sharing NAS approaches in this
setting to accelerate the search process. Unlike traditional pruning methods
with fixed thresholds, we propose to adopt a multi-objective approach that
identifies the Pareto optimal set of sub-networks, allowing for a more flexible
and automated compression process.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要