Prompting a Pretrained Transformer Can Be a Universal Approximator
CoRR(2024)
摘要
Despite the widespread adoption of prompting, prompt tuning and prefix-tuning
of transformer models, our theoretical understanding of these fine-tuning
methods remains limited. A key question is whether one can arbitrarily modify
the behavior of pretrained model by prompting or prefix-tuning it. Formally,
whether prompting and prefix-tuning a pretrained model can universally
approximate sequence-to-sequence functions. This paper answers in the
affirmative and demonstrates that much smaller pretrained models than
previously thought can be universal approximators when prefixed. In fact, the
attention mechanism is uniquely suited for universal approximation with
prefix-tuning a single attention head being sufficient to approximate any
continuous function. Moreover, any sequence-to-sequence function can be
approximated by prefixing a transformer with depth linear in the sequence
length. Beyond these density-type results, we also offer Jackson-type bounds on
the length of the prefix needed to approximate a function to a desired
precision.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要